In Pandas, the corr()
method is used to calculate pairwise correlation of columns, excluding NA/null values. This method is useful when you want to understand the linear relationship between numerical variables in your DataFrame.
In this article, I will explain the Pandas DataFrame corr()
method by using its syntax, parameters, usage, and how we can return a DataFrame showing the correlation coefficients between the columns.
Key Points –
- The
corr()
method is used to compute the pairwise correlation of columns in a DataFrame, excluding NA/null values. - It supports three types of correlation methods,
pearson
(default),kendall
, andspearman
. - The method returns a DataFrame containing the correlation coefficients between the columns.
- The
method
parameter specifies the correlation method, and themin_periods
parameter specifies the minimum number of observations required per pair of columns to produce a valid result. - The
corr()
method automatically excludes NA/null values from the correlation calculation.
Quick Examples of Pandas DataFrame corr()
If you are in a hurry, below are some quick examples of Pandas DataFrame corr() function.
# Quick examples of pandas dataframe corr()
# Calculate the correlation matrix
correlation_matrix = df.corr()
# Compute the correlation matrix
# Using pearson correlation
corr_matrix = df.corr(method='pearson')
# Calculate Spearman correlation coefficients
corr_matrix = df.corr(method='spearman')
# Calculate Kendall's tau correlation coefficients
corr_matrix = df.corr(method='kendall')
Pandas DataFrame corr() Introduction
Let’s know the syntax of the Pandas DataFrame corr().
# Syntax of Pandas DataFrame corr()
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
Parameters of the DataFrame corr()
Following are the parameters of the DataFrame corr() function.
method
– This parameter specifies the method of correlation to be used. It has three possible options.person
– Default method, computes the standard Pearson correlation coefficient.Kendall
– Computes the Kendall Tau correlation coefficient.spearman
– Computes the Spearman rank correlation coefficient.
min_periods
– This parameter specifies the minimum number of observations required per pair of columns to have a valid result. If not provided, it defaults to 1.numeric_only
– Specifies if only numeric values should be used in the operation. By default, it is set to False.
Return Value
It returns a DataFrame containing the pairwise correlation coefficients of the columns.
Basic Correlation Matrix
To compute and display the basic correlation matrix for the given DataFrame, you can use the corr()
method from Pandas.
First, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
and Discount
.
# Create a pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[25000,22000,26000,23000,30000],
'Discount':[800,1300,2000,1500,1000]
}
df = pd.DataFrame(technologies)
print("Original DataFrame:\n",df)
Yields below output.
Now that we have our DataFrame, we can use the corr()
method to compute the correlation matrix for the numerical columns (Fee
and Discount
).
# Calculate the correlation matrix
correlation_matrix = df.corr()
print("Correlation matrix:\n", correlation_matrix)
In the above example, the correlation matrix shows the pairwise correlation between Fee
and Discount
. The diagonal values are 1.000000
, indicating a perfect correlation with themselves. The off-diagonal value of -0.210225
indicates a weak positive linear relationship between Fee
and Discount
.
Using Pearson Correlation
Alternatively, to explicitly use the Pearson correlation method when computing the correlation matrix in Pandas, you specify method=pearson
within the corr()
method.
# Compute the correlation matrix
# Using pearson correlation
corr_matrix = df.corr(method='pearson')
print("Pearson Correlation Coefficients:\n", corr_matrix)
# Output:
# Pearson Correlation Coefficients:
# Fee Discount
# Fee 1.000000 -0.210225
# Discount -0.210225 1.000000
In the above example. we create a DataFrame df
with three columns, Courses
, Fee
, and Discount
. We use the corr()
method on the DataFrame df
with method=pearson
to compute the Pearson correlation coefficients. The resulting corr_matrix
is a DataFrame where each cell represents the correlation coefficient between two columns.
Using Spearman Correlation
To calculate Spearman’s rank correlation coefficients for all columns in a Pandas DataFrame, you can use the corr()
method with method='spearman'
.
# Calculate Spearman correlation coefficients
corr_matrix = df.corr(method='spearman')
print("Spearman correlation coefficients:\n", corr_matrix)
# Output:
# Spearman correlation coefficients:
# Fee Discount
# Fee 1.0 -0.1
# Discount -0.1 1.0
In the above example, we can use the corr()
method on the DataFrame df
with method=spearman
to compute Spearman’s rank correlation coefficients. The resulting corr_matrix
is a DataFrame where each cell represents the Spearman’s rank correlation coefficient between two columns.
Using Kendall Correlation
To calculate Kendall’s tau correlation coefficients for all columns in a Pandas DataFrame, you can use the corr()
method with method=kendall
.
# Calculate Kendall's tau correlation coefficients
corr_matrix = df.corr(method='kendall')
print("Kendall's tau correlation coefficients:\n", corr_matrix)
# Output:
# Kendall's tau correlation coefficients:
# Fee Discount
# Fee 1.0 0.0
# Discount 0.0 1.0
In the above example, we can use the corr()
method on the DataFrame df
with method=kendall
to compute Kendall’s tau correlation coefficients.
Frequently Asked Questions Pandas DataFrame corr() Method
The corr()
method is used to calculate the correlation between the columns of a DataFrame in pandas, which is a popular data manipulation library in Python. Correlation measures the strength and direction of the linear relationship between two variables.
To calculate Pearson correlation coefficients, simply use the corr()
method without specifying the method parameter, as it is the default.
Specify the method
parameter as 'kendall'
to calculate Kendall’s tau correlation coefficients.
The corr()
method automatically excludes NA/null values in the computation. If a pair of columns has missing values, those values are excluded from the correlation calculation.
The corr()
method only computes correlations for numeric columns. Non-numeric columns are automatically excluded from the calculation.
Conclusion
In this article, you have learned the Pandas DataFrame corr()
function by using its syntax, parameters, usage, and how you can find the correlation between the DataFrame columns using the Pearson
, kendall
, spearman
methods.
Happy Learning!!
Related Articles
- Pandas Correlation of Columns
- How to Unpivot DataFrame in Pandas?
- pandas.DataFrame.mean() Examples
- Pandas Get DataFrame Shape
- Pandas DataFrame pop() Method
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function
- Pandas DataFrame mode() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame nunique() Method
- Pandas Join DataFrames on Columns
- pandas.DataFrame.where() Examples