• Post author:
  • Post category:Pandas
  • Post last modified:August 6, 2024
  • Reading time:17 mins read
You are currently viewing Pandas DataFrame mad() Method

In Pandas, the mad() method is used to calculate the Mean Absolute Deviation (MAD) of the values in a DataFrame or Series. The MAD is a measure of the average absolute deviations of data points from the mean of the dataset. This can be useful for understanding the variability or dispersion within your data.

Advertisements

In this article, I will explain the Pandas DataFrame mad() method by using its syntax, parameters, and usage, and how to return a Series with the mean absolute deviation for each axis label.

Key Points –

  • The mad() method calculates the Mean Absolute Deviation (MAD) of the values in a DataFrame or Series, which measures the average absolute deviations from the mean.
  • Can operate along rows (axis=0) or columns (axis=1).
  • The skipna parameter allows for handling missing data by excluding NA/null values from the calculation to prevent skewed results.
  • The level parameter allows computation along a particular level of a MultiIndex.

Syntax of Pandas DataFrame mad() Method

Following is the syntax of the Pandas DataFrame mad() method.


# Syntax of Pandas DataFrame mad()
DataFrame.mad(axis=None, skipna=True, level=None)

Parameters of the DataFrame mad()

Following are the parameters of the DataFrame mad() function.

  • axis – {index (0), columns (1)}, default 0. The axis along which to compute the mean absolute deviation. 0 or index for row-wise operation. 1 or columns for column-wise operation.
  • skipna – bool, default True. Exclude NA/null values when computing the result.
  • level – int or level name, default None. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame.

Return Value

The mad() method returns a Series with the Mean Absolute Deviation of the values along the specified axis.

Usage of Pandas DataFrame mad() Method

The mad() method in Pandas is used to calculate the mean absolute deviation of the values in a DataFrame.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A, B, and C.


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [5, 8, 13, 15],
    'B': [2, 9, 4, 6],
    'C': [8, 6, 12, 3] 
})

print("Original DataFrame:\n",df)

Yields below output.

pandas mad

Calculating MAD for Each Column

To calculate the Mean Absolute Deviation (MAD) for each column in the DataFrame, you can use the mad() method with the default axis parameter (which is 0 for column-wise operation).


# Calculating MAD for each column
df2 = df.mad()
print("MAD for each column:\n", df2)

# Calculating MAD for each column
df2 = df.mad(axis=0)
print("MAD for each column:\n", df2)

In the above example, The mad() method calculates the mean absolute deviation of each column by first computing the mean of each column, then finding the absolute differences from the mean, and finally calculating the average of these absolute differences. This example yields the below output.

pandas mad

Calculating MAD for Each Row

Alternatively, to calculate the Mean Absolute Deviation (MAD) for each row in the DataFrame, you can use the mad() method with the axis parameter set to 1 for row-wise operation.


# Calculating MAD for each row
df2 = df.mad(axis=1)
print("MAD for each row:\n", df2)

In the above example, the mad() method calculates the MAD for each row by first computing the mean of the row, then finding the absolute differences from this mean, and finally averaging these absolute differences.


# Output:
MAD for each row:
0    2.000000
1    1.111111
2    3.777778
3    4.666667
dtype: float64

Calculating MAD with Missing Values Skipped

To calculate the Mean Absolute Deviation (MAD) while skipping missing values in a DataFrame, you can use the mad() method with the skipna parameter set to True (which is the default). This ensures that any NaN values are excluded from the calculation.


import pandas as pd
import numpy as np

# Sample DataFrame with missing values
df = pd.DataFrame({
    'A': [5, np.nan, 13, 15],
    'B': [2, 9, np.nan, 6],
    'C': [8, 6, 12, np.nan]
})

# Calculate MAD for each column, skipping missing values
df2 = df.mad(axis=0, skipna=True)
print("MAD with missing values skipped for each column:\n", df2)

# Calculate MAD for each column
df2 = df.mad(skipna=True)
print("MAD with missing values skipped for each column:\n", df2)

Yields below output.


# Output:
MAD with missing values skipped for each column:
A    4.000000
B    2.444444
C    2.222222
dtype: float64

To calculate the MAD for each row while skipping missing values, you can use the mad() method with the axis parameter set to 1 and ensure that skipna is set to True (which is the default behavior).


# Calculate MAD for each row, skipping missing values
df2 = df.mad(axis=1, skipna=True)
print("MAD with missing values skipped for each row:\n", df2)

Yields below output.


# Output:
MAD with missing values skipped for each row:
0    2.0
1    1.5
2    0.5
3    4.5
dtype: float64

Calculate MAD for Numeric Columns Only

Similarly, to calculate the Mean Absolute Deviation (MAD) for numeric columns only in a DataFrame that may contain mixed data types, you need to first filter out the numeric columns and then apply the mad() method.


import pandas as pd
import numpy as np

# Sample DataFrame with mixed types and missing values
df = pd.DataFrame({
    'A': [5, np.nan, 13, 15],
    'B': [2, 9, np.nan, 6],
    'C': [8, 6, 12, np.nan],
    'D': ['a', 'b', 'c', 'd']  
})

# Select only numeric columns
numeric_df = df.select_dtypes(include=[np.number])

# Calculate MAD for each numeric column
df2 = numeric_df.mad(axis=0, skipna=True)
print("MAD for each numeric column (skipping missing values):\n", df2)

In the above example, the select_dtypes(include=[np.number]) method is used to select only the numeric columns from the DataFrame, excluding any non-numeric columns. The mad() method is then applied to the filtered DataFrame (numeric_df), calculating the MAD for each numeric column while skipping any missing values (skipna=True).


# Output:
MAD for each numeric column (skipping missing values):
A    4.000000
B    2.444444
C    2.222222
dtype: float64

Calculate MAD with MultiIndex DataFrame

Finally, to calculate the Mean Absolute Deviation (MAD) for each level in a MultiIndex DataFrame, you can use the mad() method with the level parameter. This allows you to calculate the MAD along a specific level of the MultiIndex.


import pandas as pd
import numpy as np

# Sample MultiIndex DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)], names=['Group', 'Number'])
df = pd.DataFrame({
    'A': [5, np.nan, 13, 15],
    'B': [2, 9, np.nan, 6]
}, index=index)
print("Original MultiIndex DataFrame:\n", df)

# Calculate MAD along level 'Group'
df2= df.mad(level='Group', skipna=True)
print("MAD along level 'Group' (skipping missing values):\n",df2)

In the above example, a MultiIndex DataFrame is created with Group and Number as the index levels. The mad() method is used with the level parameter set to Group. This calculates the MAD for each column within each group, ignoring missing values (skipna=True).


Original MultiIndex DataFrame:
                  A    B
Group Number           
A     1        5.0  2.0
      2        NaN  9.0
B     1       13.0  NaN
      2       15.0  6.0
MAD along level 'Group' (skipping missing values):
          A    B
Group          
A      0.0  3.5
B      1.0  0.0

Frequently Asked Questions on Pandas DataFrame mad() Method

What does the mad() method do in Pandas?

The mad() method calculates the Mean Absolute Deviation (MAD) of the values in a DataFrame. MAD measures the average absolute deviation of each data point from the mean of the dataset.

What does the skipna parameter do?

The skipna parameter determines whether to exclude NA/null values from the computation. If set to True (default), NA/null values are ignored. If set to False, NA/null values are included, which may result in NA/null in the output.

Can the mad() method handle missing values?

The mad() method can handle missing values. By default, it skips missing values (skipna=True). You can change this behavior by setting skipna=False.

Is the mad() method available for both DataFrames and Series?

The mad() method is available for both DataFrames and Series in Pandas. When used on a Series, it calculates the MAD of the Series values.

How do you calculate the MAD for each column in a DataFrame?

To calculate the Mean Absolute Deviation (MAD) for each column in a DataFrame, you can use the mad() method with the default axis=0 parameter.

Conclusion

In conclusion, the Pandas DataFrame mad() method is a powerful tool for calculating the Mean Absolute Deviation (MAD) of data, providing a measure of variability by averaging the absolute differences from the mean.

Happy Learning!!

Reference