• Post author:
  • Post category:Pandas
  • Post last modified:August 6, 2024
  • Reading time:16 mins read
You are currently viewing How to Compare Two Columns Using Pandas?

Pandas is a powerful Python library for data manipulation and analysis. If you want to compare two columns in a Pandas DataFrame, you can use various methods depending on your specific requirements.

You can compare two columns using many ways, for example, by using the equality operator (==), != operator, equals(), np.where(), apply(), and isnull() functions. In this article, I will explain comparing two columns in Pandas by using all these methods with examples.

Key Points –

  • Use the == operator or the equals() method to compare two columns element-wise for equality.
  • Perform element-wise comparisons using operators like ==, !=, <, >, <=, and >= to compare values in two columns.
  • Ensure that the data types of the compared columns are compatible, as certain operations may not be valid for mismatched types.
  • Leverage the power of vectorized operations in pandas, such as using the apply() function or directly applying operations to the entire columns.
  • Utilize boolean indexing to filter and extract rows based on the comparison results, allowing for further analysis or manipulation.

Quick Examples of Compare Two Columns

If you are in a hurry, below are some quick examples of comparing two columns.


# Quick examples of compare two columns

# example 1: Check for equality element-wise
df['Equality'] = df['Column1'] == df['Column2']

# example 2: Compare two columns using equals() method
columns_equal = df['Column1'].equals(df['Column2'])

# example 3: Check for inequality between entire columns
columns_not_equal = df['Column1'] != df['Column2']

# example 4: Compare two columns using np.where() method
df['ComparisonResult'] = np.where(df['Column1'] > df['Column2'], 'Column1 > Column2', 'Column1 <= Column2')

# example 5: Compare two columns using apply()
df['ComparisonResult'] = df.apply(lambda row: 'Column1 > Column2' if row['Column1'] > row['Column2'] else 'Column1 <= Column2', axis=1)

# example 6: Check for null values in either column
df['AnyNull'] = df['Column1'].isnull() | df['Column2'].isnull()

Create DataFrame

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1, Column2.


# Create DataFrame 
import pandas as pd
import numpy as np

data = {'Column1': [5, 15, 8, 20, 25],
        'Column2': [4, 12, 8, 17, 25]}
df = pd.DataFrame(data)
print("Create DataFrame:\n",df)

Yields below output.

 pandas compare columns

Compare DataFrame Columns using Equal Operator

To check for equality between two columns element-wise in a Pandas DataFrame, you can directly use the equality operator (==).


# Check for equality element-wise
df['Equality'] = df['Column1'] == df['Column2']
print("Check for equality element-wise:\n",df)

In the above example, the new column Equality will be True for rows where the values in Column1 are equal to the values in Column2, and False otherwise. Adjust the column names based on your actual DataFrame. This example yields the below output.

 pandas compare columns

Compare Two Columns Using equals() Method

If you want to check for equality between entire columns (not element-wise), you can use the equals() method.


# Compare two columns 
# Using equals() method
columns_equal = df['Column1'].equals(df['Column2'])
print(f"Columns are equal: {columns_equal}")

# Output:
# Columns are equal: False

In the above example, the Column1 and Column2 are not entirely equal. The equals() method returns False because there are differences in the values between the two columns.

Check for Inequality

To check for inequality between entire columns in a Pandas DataFrame, you can use the != operator. In this case, the output is True because there is at least one pair of corresponding values that are not equal in Column1 and Column2.


# Check for inequality between entire columns
columns_not_equal = df['Column1'] != df['Column2']
print(f"Columns are not equal: {columns_not_equal.any()}")

# Output:
# Columns are not equal: True

Compare Two Columns Using np.where() Methods

You can use np.where() to compare two columns and create a new column based on the comparison result.


# Compare two columns using np.where() method
df['ComparisonResult'] = np.where(df['Column1'] > df['Column2'], 'Column1 > Column2', 'Column1 <= Column2')
print("DataFrame with Comparison Result:\n", df)

# Output:
# DataFrame with Comparison Result:
#     Column1  Column2    ComparisonResult
# 0        5        4   Column1 > Column2
# 1       15       12   Column1 > Column2
# 2        8        8  Column1 <= Column2
# 3       20       17   Column1 > Column2
# 4       25       25  Column1 <= Column2

In the above examples, the new column ComparisonResult is created based on the condition that values in Column1 are compared to values in Column2. If the condition is true, the corresponding cell in ComparisonResult will be assigned the string Column1>Column2, otherwise Column1<=Column2.

Compare Two Columns Using the apply() Method

Similarly, the apply() method can be used to compare two columns element-wise and create a new column based on the comparison result.


# Compare two columns using apply()
df['ComparisonResult'] = df.apply(lambda row: 'Column1 > Column2' if row['Column1'] > row['Column2'] else 'Column1 <= Column2', axis=1)
print("DataFrame with Comparison Result:\n", df)

In the above example, the apply() method is used with a lambda function to compare values in Column1 and Column2 for each row. If the condition is true, Column1 > Column2 is assigned to the corresponding cell in ComparisonResult, otherwise Column1 <= Column2 is assigned. Yields the same output as above.

Check for Null Values in Either Column

To check for null values in either column and create a new column indicating the presence of null values, you can use the isnull() method along with the | (logical OR) operator.


# Create DataFrame 
import pandas as pd
import numpy as np

data = {'Column1': [5, None, 8, 20, None],
        'Column2': [4, 12, 8, None, 25]}
df = pd.DataFrame(data)

# Check for null values in either column
df['AnyNull'] = df['Column1'].isnull() | df['Column2'].isnull()
print("DataFrame with Null Value Check:\n", df)

# Output:
# DataFrame with Null Value Check:
#     Column1  Column2  AnyNull
# 0      5.0      4.0    False
# 1      NaN     12.0     True
# 2      8.0      8.0    False
# 3     20.0      NaN     True
# 4      NaN     25.0     True

In the above example, the new column AnyNull will be True for rows where either Column1 or Column2 has a null value, and False otherwise.

FAQs on Compare Two Columns

How can I check if two columns are equal in a Pandas DataFrame?

You can use the equality operator (==) or the equals() method to check if two columns are equal. For element-wise comparison, use the equality operator. To check if entire columns are equal, use the equals() method.

How do I check if two columns are equal element-wise?

To check if two columns are equal element-wise in a Pandas DataFrame, you can use the == operator.

Check if two columns are not equal?

To check if two columns are not equal element-wise in a Pandas DataFrame, you can use the != operator.

How can I find the rows where two columns are equal?

To find the rows where two columns are equal in a Pandas DataFrame, you can use boolean indexing. The df['column1'] == df['column2'] creates a boolean mask, and using this mask inside square brackets (df[...]) filters the rows where the condition is True.

How do I compare two columns and create a new column with the result?

You can compare two columns element-wise and create a new column based on the result using various methods in Pandas. One common approach is to use the np.where() method for conditional comparisons.

Conclusion

In this article, I have explained comparing two columns in a Pandas DataFrame is a common and essential task in data analysis. Several methods, such as using the equality operator (==), the equals() method, and the apply() method, can be used to perform these comparisons. Additionally, the np.where() method provides a flexible way to create new columns based on conditional comparisons.

Happy Learning!!

Reference

Leave a Reply