In Pandas, the mad()
method is used to calculate the Mean Absolute Deviation (MAD) of the values in a DataFrame or Series. The MAD is a measure of the average absolute deviations of data points from the mean of the dataset. This can be useful for understanding the variability or dispersion within your data.
In this article, I will explain the Pandas DataFrame mad()
method by using its syntax, parameters, and usage, and how to return a Series with the mean absolute deviation for each axis label.
Key Points –
- The
mad()
method calculates the Mean Absolute Deviation (MAD) of the values in a DataFrame or Series, which measures the average absolute deviations from the mean. - Can operate along rows (
axis=0
) or columns (axis=1
). - The
skipna
parameter allows for handling missing data by excluding NA/null values from the calculation to prevent skewed results. - The
level
parameter allows computation along a particular level of a MultiIndex.
Syntax of Pandas DataFrame mad() Method
Following is the syntax of the Pandas DataFrame mad() method.
# Syntax of Pandas DataFrame mad()
DataFrame.mad(axis=None, skipna=True, level=None)
Parameters of the DataFrame mad()
Following are the parameters of the DataFrame mad() function.
axis
– {index (0), columns (1)}, default 0. The axis along which to compute the mean absolute deviation.0
orindex
for row-wise operation.1
orcolumns
for column-wise operation.skipna
– bool, default True. Exclude NA/null values when computing the result.level
– int or level name, default None. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame.
Return Value
The mad()
method returns a Series with the Mean Absolute Deviation of the values along the specified axis.
Usage of Pandas DataFrame mad() Method
The mad()
method in Pandas is used to calculate the mean absolute deviation of the values in a DataFrame.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
, B
, and C
.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [5, 8, 13, 15],
'B': [2, 9, 4, 6],
'C': [8, 6, 12, 3]
})
print("Original DataFrame:\n",df)
Yields below output.
Calculating MAD for Each Column
To calculate the Mean Absolute Deviation (MAD) for each column in the DataFrame, you can use the mad()
method with the default axis
parameter (which is 0
for column-wise operation).
# Calculating MAD for each column
df2 = df.mad()
print("MAD for each column:\n", df2)
# Calculating MAD for each column
df2 = df.mad(axis=0)
print("MAD for each column:\n", df2)
In the above example, The mad()
method calculates the mean absolute deviation of each column by first computing the mean of each column, then finding the absolute differences from the mean, and finally calculating the average of these absolute differences. This example yields the below output.
Calculating MAD for Each Row
Alternatively, to calculate the Mean Absolute Deviation (MAD) for each row in the DataFrame, you can use the mad()
method with the axis
parameter set to 1
for row-wise operation.
# Calculating MAD for each row
df2 = df.mad(axis=1)
print("MAD for each row:\n", df2)
In the above example, the mad()
method calculates the MAD for each row by first computing the mean of the row, then finding the absolute differences from this mean, and finally averaging these absolute differences.
# Output:
MAD for each row:
0 2.000000
1 1.111111
2 3.777778
3 4.666667
dtype: float64
Calculating MAD with Missing Values Skipped
To calculate the Mean Absolute Deviation (MAD) while skipping missing values in a DataFrame, you can use the mad()
method with the skipna
parameter set to True
(which is the default). This ensures that any NaN
values are excluded from the calculation.
import pandas as pd
import numpy as np
# Sample DataFrame with missing values
df = pd.DataFrame({
'A': [5, np.nan, 13, 15],
'B': [2, 9, np.nan, 6],
'C': [8, 6, 12, np.nan]
})
# Calculate MAD for each column, skipping missing values
df2 = df.mad(axis=0, skipna=True)
print("MAD with missing values skipped for each column:\n", df2)
# Calculate MAD for each column
df2 = df.mad(skipna=True)
print("MAD with missing values skipped for each column:\n", df2)
Yields below output.
# Output:
MAD with missing values skipped for each column:
A 4.000000
B 2.444444
C 2.222222
dtype: float64
To calculate the MAD for each row while skipping missing values, you can use the mad()
method with the axis
parameter set to 1
and ensure that skipna
is set to True
(which is the default behavior).
# Calculate MAD for each row, skipping missing values
df2 = df.mad(axis=1, skipna=True)
print("MAD with missing values skipped for each row:\n", df2)
Yields below output.
# Output:
MAD with missing values skipped for each row:
0 2.0
1 1.5
2 0.5
3 4.5
dtype: float64
Calculate MAD for Numeric Columns Only
Similarly, to calculate the Mean Absolute Deviation (MAD) for numeric columns only in a DataFrame that may contain mixed data types, you need to first filter out the numeric columns and then apply the mad()
method.
import pandas as pd
import numpy as np
# Sample DataFrame with mixed types and missing values
df = pd.DataFrame({
'A': [5, np.nan, 13, 15],
'B': [2, 9, np.nan, 6],
'C': [8, 6, 12, np.nan],
'D': ['a', 'b', 'c', 'd']
})
# Select only numeric columns
numeric_df = df.select_dtypes(include=[np.number])
# Calculate MAD for each numeric column
df2 = numeric_df.mad(axis=0, skipna=True)
print("MAD for each numeric column (skipping missing values):\n", df2)
In the above example, the select_dtypes(include=[np.number])
method is used to select only the numeric columns from the DataFrame, excluding any non-numeric columns. The mad()
method is then applied to the filtered DataFrame (numeric_df
), calculating the MAD for each numeric column while skipping any missing values (skipna=True
).
# Output:
MAD for each numeric column (skipping missing values):
A 4.000000
B 2.444444
C 2.222222
dtype: float64
Calculate MAD with MultiIndex DataFrame
Finally, to calculate the Mean Absolute Deviation (MAD) for each level in a MultiIndex DataFrame, you can use the mad()
method with the level
parameter. This allows you to calculate the MAD along a specific level of the MultiIndex.
import pandas as pd
import numpy as np
# Sample MultiIndex DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)], names=['Group', 'Number'])
df = pd.DataFrame({
'A': [5, np.nan, 13, 15],
'B': [2, 9, np.nan, 6]
}, index=index)
print("Original MultiIndex DataFrame:\n", df)
# Calculate MAD along level 'Group'
df2= df.mad(level='Group', skipna=True)
print("MAD along level 'Group' (skipping missing values):\n",df2)
In the above example, a MultiIndex DataFrame is created with Group
and Number
as the index levels. The mad()
method is used with the level
parameter set to Group
. This calculates the MAD for each column within each group, ignoring missing values (skipna=True
).
Original MultiIndex DataFrame:
A B
Group Number
A 1 5.0 2.0
2 NaN 9.0
B 1 13.0 NaN
2 15.0 6.0
MAD along level 'Group' (skipping missing values):
A B
Group
A 0.0 3.5
B 1.0 0.0
Frequently Asked Questions on Pandas DataFrame mad() Method
The mad()
method calculates the Mean Absolute Deviation (MAD) of the values in a DataFrame. MAD measures the average absolute deviation of each data point from the mean of the dataset.
The skipna
parameter determines whether to exclude NA/null values from the computation. If set to True
(default), NA/null values are ignored. If set to False
, NA/null values are included, which may result in NA/null in the output.
The mad()
method can handle missing values. By default, it skips missing values (skipna=True
). You can change this behavior by setting skipna=False
.
The mad()
method is available for both DataFrames and Series in Pandas. When used on a Series, it calculates the MAD of the Series values.
To calculate the Mean Absolute Deviation (MAD) for each column in a DataFrame, you can use the mad()
method with the default axis=0
parameter.
Conclusion
In conclusion, the Pandas DataFrame mad()
method is a powerful tool for calculating the Mean Absolute Deviation (MAD) of data, providing a measure of variability by averaging the absolute differences from the mean.
Happy Learning!!
Related Articles
- Pandas DataFrame mode() Method
- Pandas DataFrame corrwith() Method
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method
- Pandas DataFrame equals() Method
- Pandas DataFrame clip() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame product() Method
- How to Unpivot DataFrame in Pandas?