Pandas `DataFrame.compare()`

function is used to compare given DataFrames row by row along with the specified align_axis. Sometimes we have two or more DataFrames having the same data with slight changes, in those situations we need to observe the difference between two DataFrames. By default, `compare()`

function compares two DataFrames column-wise and returns the differences side by side. It can compare only DataFrames having the same shape with the same dimensions and having the same row indexes and column labels.

In this article, I will explain using `compare()`

function, its syntax, and parameters how we can compare the two DataFrames row by row with examples.

## 1. Quick Examples of Compare Two DataFrames Row by Row

If you are in a hurry, below are some quick examples of comparing two DataFrames row by row.

```
# Below are quick examples
# Example 1: Compare two DataFrames row by row
diff = df.compare(df1, align_axis = 0)
# Example 2: To ignore NaN values set keep_equal=True
diff = df.compare(df1, keep_equal=True, align_axis = 0)
# Example 3: Set keep_shape = true and keep same shape
diff = df.compare(df1, keep_shape = True, align_axis = 0)
# Example 4: Get differences of DataFrames keep equal values and shape
diff = df.compare(df1, keep_equal=True, keep_shape = True, align_axis = 0)
```

## 2. Syntax of Pandas df.compare()

Following is the syntax of the Pandas compare() function.

```
# Following is the syntax of compare() function
DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))
```

### 2.1 Parameters

Following are the parameters of the `compare()`

function.

`Other:`

It is a DataFrame Object and is used to compare with a given DataFrame.`align_axis:`

It defines the axis of comparison. The default value is`1`

for columns. If it is set with`0`

for rows.`keep_shape:`

(bool), Default value is`False`

. If it is`True`

, all rows and columns exist along with different values. Otherwise, only different values exist.`keep_equal :`

(bool) Default value is`False`

. If it is`True`

, keep all equal values instead of NaN values.`result_names`

: (tuple): Default (‘self’, ‘other’)

### 2.2 Return Value

It returns DataFrame where the elements are not matching of given DataFrames. Resulting in DataFrame having a multi-index with ‘self’ and ‘other’ are at the innermost level of the row index.

**Create DataFrame**

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are `Courses`

, `Fee`

, `Duration`

and `Discount`

.

```
# Create DataFrame
import pandas as pd
import pandas as pd
technologies = ({
'Courses':["Spark", "NumPY", "pandas", "Java", "PySpark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
technologies1 = ({
'Courses':["Spark", "Hadoop", "pandas", "Java", "PySpark"],
'Fee' :[20000,24000,30000,22000,21000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies)
print("DataFrame1:\n", df)
df1 = pd.DataFrame(technologies1)
print("DataFrame2:\n", df1)
```

Yields below output.

## 3. Usage of Pandas DataFrame.compare() Function.

Pandas DataFrame.compare() function compares two equal sizes and dimensions of DataFrames row by row along with align_axis = 0 and returns The DataFrame with unequal values of given DataFrames. By default, it compares the DataFrames column by column. If we want to get the same sized resulting DataFrame we can use its parameter keep_shape and use keep_equal param to avoid NaN values in the resulting DataFrame.

```
# Comparing the two DataFrames row by row
diff = df.compare(df1, align_axis = 0)
print(" After comparing two DataFrames:\n", diff)
```

Yields below output.

## 4. Pass keep_equal into compare() & Compare

As we can see from the above, the resulting DataFrame has been obtained where equal values are treated as NaN values. So, overcome the NaN values by setting `keep_equal`

as `True`

then and pass into compare() function. It will override the NaN values with equal values of given DataFrames.

```
# Ignore NaN values pass keep_equal=True
diff = df.compare(df1, keep_equal=True, align_axis = 0)
print(" After comparing two DataFrames:\n", diff)
```

Yields below output.

```
# Output:
# After comparing two DataFrames:
Courses Fee
1 self NumPy 25000
other Hadoop 24000
4 self Pyspark 26000
other Pyspark 21000
```

## 5. Pass keep_shape into compare() & Compare Pandas Row by Row

If we want to get the same-sized resulting DataFrame, we can set `keep_shape`

as `True`

and then pass it to the `compare()`

function. It will return the same-sized DataFrame where equal values are treated as NaN values. For example,

```
# Set keep_shape = true and keep same shape
diff = df.compare(df1, keep_shape = True, align_axis = 0)
print(" After comparing two DataFrames:\n", diff)
```

Yields below output.

```
# Output:
# After comparing two DataFrames:
Courses Fee Duration Discount
0 self NaN NaN NaN NaN
other NaN NaN NaN NaN
1 self NumPy 25000.0 NaN NaN
other Hadoop 24000.0 NaN NaN
2 self NaN NaN NaN NaN
other NaN NaN NaN NaN
3 self NaN NaN NaN NaN
other NaN NaN NaN NaN
4 self NaN 26000.0 NaN NaN
other NaN 21000.0 NaN NaN
```

## 6. Pass keep_equal & keep_shape into compare()

Set `keep_shape`

and `keep_equal`

as `True`

and pass them into the compare() function it will return the same-sized resulting DataFrame along with equal values of given DataFrames.

```
# Get differences of DataFrames keep equal values and shape
diff = df.compare(df1, keep_equal=True, keep_shape = True, align_axis = 0)
print(" After comparing two DataFrames:\n", diff)
```

Yields below output.

```
# Output:
# After comparing two DataFrames:
Courses Fee Duration Discount
0 self Spark 20000 30days 1000
other Spark 20000 30days 1000
1 self NumPy 25000 40days 2500
other Hadoop 24000 40days 2500
2 self pandas 30000 35days 1500
other pandas 30000 35days 1500
3 self Java 22000 60days 1200
other Java 22000 60days 1200
4 self Pyspark 26000 50days 3000
other Pyspark 21000 50days 3000
```

## 7. Frequently Asked Questions

**What is the purpose of the compare() method in Pandas?**

The `compare()`

method in Pandas is designed to compare two DataFrames and highlight the differences between them. It provides a convenient way to identify discrepancies in values and shapes between corresponding elements in the two DataFrames.

**How does the compare() method display differences?**

The `compare()`

method displays differences in a tabular format, showing columns with hierarchical indexing. Each column has two sub-columns (‘self’ and ‘other’) to represent the values in the first and second DataFrames, respectively. Differences are highlighted by displaying the differing values, and equal values are shown as NaN.

**How can I include rows with equal values in the result?**

To include rows with equal values in the result when using the `compare()`

method in Pandas, you need to set the `keep_equal`

parameter to `True`

. This parameter controls whether to include elements that have equal values in both DataFrames.

**How can I include rows with differences in shape in the result?**

To include rows with differences in shape in the result when using the `compare()`

method in Pandas, you need to set the `keep_shape`

parameter to `True`

. This parameter controls whether to include elements that have different shapes in the two DataFrames.

**Can I customize the behavior of the compare() method further?**

You can customize the behavior by using additional parameters such as `keep_equal`

, `keep_shape`

, and `keep_different`

. These parameters allow you to control which elements are included in the result based on your specific requirements.

**Does the compare() method modify the original DataFrames?**

The `compare()`

method does not modify the original DataFrames. It returns a new DataFrame containing the comparison results, allowing you to analyze the differences without altering the original data.

## 8. Conclusion

In this article, I have explained `DataFrame`

.`compare()`

function and using its syntax, and parameters how we can compare the two DataFrames row by row along with multiple examples

## Related Articles

- How to stack the Pandas DataFrame?
- How to unstack the Pandas DataFrame?
- Pandas Difference Between Two DataFrames
- How to Plot Columns of Pandas DataFrame
- How to Add Plot Legends in Pandas?
- Pandas DataFrame insert() Function
- How to Get Size of Pandas DataFrame?
- How to Convert Pandas DataFrame to List?
- How to Convert Pandas to PySpark DataFrame
- Pandas Series.isin() Function
- Pandas.Series.combine()
- Pandas Rolling Sum