Pandas `DataFrame.compare()`

function is used to show the difference between two DataFrames column by column or row by row. Sometimes we have two or more DataFrames having the same data with slight changes, in those situations we need to observe the difference between those DataFrames.

By default `compare()`

function compares two DataFrames column-wise and returns the differences side by side. It can compare only DataFrames having the same shape with the same dimensions and having the same row indexes and column labels. In this article, I will explain using `compare()`

function, its syntax, and parameters how we can compare the two DataFrames with examples.

## 1. Quick Examples of Difference Between Two DataFrames

If you are in a hurry, below are some quick examples of differences between two Pandas DataFrames.

```
# Quick examples of difference between two dataframes
# Example 1: Compare two DataFrames
diff = df.compare(df1)
# Example 2: To ignore NaN values set keep_equal=True
diff = df.compare(df1, keep_equal=True)
# Example 3: Set keep_shape = true and keep same shape
diff = df.compare(df1, keep_shape = True)
# Example 4: Get differences of DataFrames keep equal values and shape
diff = df.compare(df1, keep_equal=True, keep_shape = True)
```

## 2. Syntax of DataFrame compare()

Following is the syntax of `compare()`

function to find the differences of DataFrames.

```
# Syntax of compare() function
DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))
```

### 2.1 Parameters

Following are the parameters of the `compare()`

function.

`Other:`

It is DataFrame Object and used to compare with given DataFrame.`align_axis:`

It defines the axis of comparison. The default value is`1`

for columns. If it is set with`0`

for rows. For columns resulting differences are merged vertically whereas, for rows resulting differences are merged horizontally.`keep_shape:`

(bool), the Default value is`False`

. If it is`True`

, all rows and columns are existed along with different values. Otherwise, only different values exist.`keep_equal :`

(bool) Default value is`False`

. If it is`True`

, keeps all equal values instead of NaN values.`result_names`

: (tuple): Default (‘self’, ‘other’)

### 2.2 Return Value

It returns DataFrame where, the elements are differences of given DataFrames. The resulting DataFrame has a multi-index with ‘self’ and ‘other’ are at the innermost level of the column label.

### 2.3 Create DataFrame

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are `Courses`

, `Fee`

, `Duration`

and `Discount`

.

```
# Create DataFrame
import pandas as pd
import pandas as pd
technologies = ({
'Courses':["Spark", "NumPY", "pandas", "Java", "PySpark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
technologies1 = ({
'Courses':["Spark", "Hadoop", "pandas", "Java", "PySpark"],
'Fee' :[20000,24000,30000,22000,21000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies)
print("DataFrame1:\n", df)
df1 = pd.DataFrame(technologies1)
print("DataFrame2:\n", df1)
```

Yields below output.

## 3. Usage of Pandas DataFrame.compare()

Pandas `DataFrame.compare()`

function compares two equal sizes and dimensions of DataFrames column-wise and returns the differences. Set `align_axis`

is `True`

to compare the DataFrames row by row. If we want to get same sized resulting DataFrame we can use its parameter `keep_shape`

and use `keep_equal`

param to avoid NaN values in the resulting DataFrame.

Let’s use `compare()`

function on given DataFrames to find the difference between two DataFrames.

```
# Compare two DataFrames
diff = df.compare(df1)
print("Difference between two DataFrames:\n", diff)
```

Yields below output.

```
# Output:
# Difference between two DataFrames:
Courses Fee
self other self other
1 NumPy Hadoop 25000.0 24000.0
4 NaN NaN 26000.0 21000.0
```

As we can see from the above, differences have been added side by side in the resultant DataFrame.

## 4. Use keep_equal to Get Pandas Difference

In the above example, the resulting Dataframe has been obtained where equal values are treated as NaN values. So, to overcome the NaN values set `keep_equal`

as `True`

and pass into `compare()`

function. It will override the NaN values with equal values of given DataFrames.

```
# To ignore NaN values set keep_equal=True
diff = df.compare(df1, keep_equal=True)
print(diff)
```

Yields below output.

```
# Output:
Courses Fee
self other self other
1 NumPy Hadoop 25000 24000
4 PySpark PySpark 26000 21000
```

## 5. Using keep_shape to Get Pandas Differences

If we want to get the same sized resulting DataFrame, we can set `keep_shape`

is `True`

then pass into `compare()`

function. It will return the same sized DataFrame where equal values are treated as NaN values. For example,

```
# Set keep_shape = true and keep same shape
diff = df.compare(df1, keep_shape = True)
print(diff)
```

Yields below output.

```
# Output:
Courses Fee Duration Discount
self other self other self other self other
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NumPy Hadoop 25000.0 24000.0 NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN 26000.0 21000.0 NaN NaN NaN NaN
```

## 6. Using keep_equal & keep_shape

Set `keep_shape`

and `keep_equal`

as `True`

and pass them into `compare()`

function to return the same-sized resulting DataFrame along with equal values of given DataFrames.

```
# Get differences of DataFrames keep equal values and shape
diff = df.compare(df1, keep_equal=True, keep_shape = True)
print(diff)
```

Yields below output.

```
# Output:
Courses Fee Duration Discount
self other self other self other self other
0 Spark Spark 20000 20000 30days 30days 1000 1000
1 NumPy Hadoop 25000 24000 40days 40days 2500 2500
2 pandas pandas 30000 30000 35days 35days 1500 1500
3 Java Java 22000 22000 60days 60days 1200 1200
4 PySpark PySpark 26000 21000 50days 50days 3000 3000
```

## Frequently Asked Questions on Difference Between Two DataFrames

**What is the purpose of the compare() method in Pandas?**

The `compare()`

method in Pandas is designed to compare two DataFrames and highlight the differences between them. It provides a convenient way to identify discrepancies in values and shapes between corresponding elements in the two DataFrames.

**How does the compare() method display differences?**

The `compare()`

method displays differences in a tabular format, showing columns with hierarchical indexing. Each column has two sub-columns (‘self’ and ‘other’) to represent the values in the first and second DataFrames, respectively. Differences are highlighted by displaying the differing values, and equal values are shown as NaN.

**How can I include rows with equal values in the result?**

To include rows with equal values in the result when using the `compare()`

method in Pandas, you need to set the `keep_equal`

parameter to `True`

. This parameter controls whether to include elements that have equal values in both DataFrames.

**How can I include rows with differences in shape in the result?**

To include rows with differences in shape in the result when using the `compare()`

method in Pandas, you need to set the `keep_shape`

parameter to `True`

. This parameter controls whether to include elements that have different shapes in the two DataFrames.

**Can I customize the behavior of the compare() method further?**

You can customize the behavior by using additional parameters such as `keep_equal`

, `keep_shape`

, and `keep_different`

. These parameters allow you to control which elements are included in the result based on your specific requirements.

**Does the compare() method modify the original DataFrames?**

The `compare()`

method does not modify the original DataFrames. It returns a new DataFrame containing the comparison results, allowing you to analyze the differences without altering the original data.

## Conclusion

In this article, I have explained how to use `DataFrame.compare()`

function, its syntax, parameters, and how to compare the two DataFrames with examples.

## Related Articles

- How to stack the Pandas DataFrame?
- Pandas Correlation of Columns
- Pandas Select Rows Based on List Index
- How to unstack the Pandas DataFrame?
- How to Rename Columns With List in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Difference Between loc[] vs iloc[]
- Differences between Pandas Join vs Merge
- Get First N Rows of Pandas DataFrame
- Pandas compare two DataFrames row by row
- Pandas Find Row Values for Column Maximal
- Pandas difference between map, applymap and apply methods