• Post author:
• Post category:Pandas

In Pandas, the `corr()` method is used to calculate pairwise correlation of columns, excluding NA/null values. This method is useful when you want to understand the linear relationship between numerical variables in your DataFrame.

In this article, I will explain the Pandas DataFrame `corr()` method by using its syntax, parameters, usage, and how we can return a DataFrame showing the correlation coefficients between the columns.

Key Points –

• The `corr()` method is used to compute the pairwise correlation of columns in a DataFrame, excluding NA/null values.
• It supports three types of correlation methods, `pearson` (default), `kendall`, and `spearman`.
• The method returns a DataFrame containing the correlation coefficients between the columns.
• The `method` parameter specifies the correlation method, and the `min_periods` parameter specifies the minimum number of observations required per pair of columns to produce a valid result.
• The `corr()` method automatically excludes NA/null values from the correlation calculation.

## Quick Examples of Pandas DataFrame corr()

If you are in a hurry, below are some quick examples of Pandas DataFrame corr() function.

``````
# Quick examples of pandas dataframe corr()

# Calculate the correlation matrix
correlation_matrix = df.corr()

# Compute the correlation matrix
# Using pearson correlation
corr_matrix = df.corr(method='pearson')

# Calculate Spearman correlation coefficients
corr_matrix = df.corr(method='spearman')

# Calculate Kendall's tau correlation coefficients
corr_matrix = df.corr(method='kendall')
``````

## Pandas DataFrame corr() Introduction

Let’s know the syntax of the Pandas DataFrame corr().

``````
# Syntax of Pandas DataFrame corr()
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
``````

### Parameters of the DataFrame corr()

Following are the parameters of the DataFrame corr() function.

• `method` – This parameter specifies the method of correlation to be used. It has three possible options.
• `person` – Default method, computes the standard Pearson correlation coefficient.
• `Kendall` – Computes the Kendall Tau correlation coefficient.
• `spearman` – Computes the Spearman rank correlation coefficient.
• `min_periods` – This parameter specifies the minimum number of observations required per pair of columns to have a valid result. If not provided, it defaults to 1.
• `numeric_only` – Specifies if only numeric values should be used in the operation. By default, it is set to False.

### Return Value

It returns a DataFrame containing the pairwise correlation coefficients of the columns.

## Basic Correlation Matrix

To compute and display the basic correlation matrix for the given DataFrame, you can use the `corr()` method from Pandas.

First, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are `Courses``Fee` and `Discount`.

``````
# Create a pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Fee' :[25000,22000,26000,23000,30000],
'Discount':[800,1300,2000,1500,1000]
}
df = pd.DataFrame(technologies)
print("Original DataFrame:\n",df)
``````

Yields below output.

Now that we have our DataFrame, we can use the `corr()` method to compute the correlation matrix for the numerical columns (`Fee` and `Discount`).

``````
# Calculate the correlation matrix
correlation_matrix = df.corr()
print("Correlation matrix:\n", correlation_matrix)
``````

In the above example, the correlation matrix shows the pairwise correlation between `Fee` and `Discount`. The diagonal values are `1.000000`, indicating a perfect correlation with themselves. The off-diagonal value of `-0.210225` indicates a weak positive linear relationship between `Fee` and `Discount`.

## Using Pearson Correlation

Alternatively, to explicitly use the Pearson correlation method when computing the correlation matrix in Pandas, you specify `method=pearson` within the `corr()` method.

``````
# Compute the correlation matrix
# Using pearson correlation
corr_matrix = df.corr(method='pearson')
print("Pearson Correlation Coefficients:\n", corr_matrix)

# Output:
# Pearson Correlation Coefficients:
#                 Fee  Discount
# Fee       1.000000 -0.210225
# Discount -0.210225  1.000000
``````

In the above example. we create a DataFrame `df` with three columns, `Courses`, `Fee`, and `Discount`. We use the `corr()` method on the DataFrame `df` with `method=pearson` to compute the Pearson correlation coefficients. The resulting `corr_matrix` is a DataFrame where each cell represents the correlation coefficient between two columns.

## Using Spearman Correlation

To calculate Spearman’s rank correlation coefficients for all columns in a Pandas DataFrame, you can use the `corr()` method with `method='spearman'`.

``````
# Calculate Spearman correlation coefficients
corr_matrix = df.corr(method='spearman')
print("Spearman correlation coefficients:\n", corr_matrix)

# Output:
# Spearman correlation coefficients:
#            Fee  Discount
# Fee       1.0      -0.1
# Discount -0.1       1.0
``````

In the above example, we can use the `corr()` method on the DataFrame `df` with `method=spearman` to compute Spearman’s rank correlation coefficients. The resulting `corr_matrix` is a DataFrame where each cell represents the Spearman’s rank correlation coefficient between two columns.

## Using Kendall Correlation

To calculate Kendall’s tau correlation coefficients for all columns in a Pandas DataFrame, you can use the `corr()` method with `method=kendall`.

``````
# Calculate Kendall's tau correlation coefficients
corr_matrix = df.corr(method='kendall')
print("Kendall's tau correlation coefficients:\n", corr_matrix)

# Output:
# Kendall's tau correlation coefficients:
#            Fee  Discount
# Fee       1.0       0.0
# Discount  0.0       1.0
``````

In the above example, we can use the `corr()` method on the DataFrame `df` with `method=kendall` to compute Kendall’s tau correlation coefficients.

## Frequently Asked Questions Pandas DataFrame corr() Method

What is the corr() method used for?

The `corr()` method is used to calculate the correlation between the columns of a DataFrame in pandas, which is a popular data manipulation library in Python. Correlation measures the strength and direction of the linear relationship between two variables.

How do I calculate Pearson correlation coefficients using corr()?

To calculate Pearson correlation coefficients, simply use the `corr()` method without specifying the method parameter, as it is the default.

How do I calculate Kendall’s tau correlation coefficients using corr()?

Specify the `method` parameter as `'kendall'` to calculate Kendall’s tau correlation coefficients.

How does corr() handle missing values (NaNs)?

The `corr()` method automatically excludes NA/null values in the computation. If a pair of columns has missing values, those values are excluded from the correlation calculation.

Can I use the corr() method on a DataFrame with non-numeric columns?

The `corr()` method only computes correlations for numeric columns. Non-numeric columns are automatically excluded from the calculation.

## Conclusion

In this article, you have learned the Pandas DataFrame `corr()` function by using its syntax, parameters, usage, and how you can find the correlation between the DataFrame columns using the `Pearson`, `kendall`, `spearman` methods.

Happy Learning!!