• Post author:
  • Post category:Pandas
  • Post last modified:July 23, 2024
  • Reading time:18 mins read
You are currently viewing Pandas DataFrame median() Method

In Pandas, the median() method is used to compute the median of values for each column or row in a DataFrame, excluding NA/null values by default.

Advertisements

In this article, I will explain the Pandas DataFrame median() method by using its syntax, parameters, and usage, and how to return the median of the values along the specified axis.

Key Points –

  • Computes the median of the values along the specified axis of the DataFrame.
  • Accepts axis=0 (default) to compute the median for each column and axis=1 to compute the median for each row.
  • By default, skipna=True excludes NA/null values from the computation. Setting skipna=False includes them.
  • By default, the method operates on numeric data types, but you can control this behavior with the numeric_only parameter.
  • The level parameter allows for median calculation along a particular level of a MultiIndex, collapsing the DataFrame accordingly.

Syntax of Pandas DataFrame median() Method

Let’s know the syntax of the pandas DataFrame.median() method.


# Syntax of DataFrame.median() method
DataFrame.median(axis=0, skipna=True, level=None, numeric_only=None, **kwargs)

Parameters of the median()

Following are the parameters of the median() method

  • axis – {index (0), columns (1)}. The axis along which to compute the median. The default is axis=0, which means it will calculate the median for each column.
  • skipna – bool, default True. Exclude NA/null values when computing the median. If False, NA/null values are included in the computation.
  • level – int or level name, default None. If the axis is a MultiIndex (hierarchical), compute the median along a particular level, collapsing into a Series.
  • numeric_only – bool, default None. Include only float, int, and boolean data. If None, will attempt to use everything, then use only numeric data.
  • **kwargs – Additional keyword arguments to be passed to the method.

Return Value

It returns the method returns a Series object containing the median values for the requested axis.

Usage of Pandas DataFrame median() Method

The median() method in Pandas is used to calculate the median (the middle value) of numeric data along a specified axis of a DataFrame.

To run some examples of pandas DataFrame median() method, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A , B and C.


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [2, 4, 6, 8, 10],
    'B': [1, 3, 5, 7, 9],
    'C': [4, 6, 1, 9, 3]
})
print("Original DataFrame:\n",df)

Yields below output.

pandas median

Column-wise Median (Default)

To compute the column-wise median of a DataFrame (which is the default behavior of the median() method), you can simply call the median() method without specifying the axis parameter, as it defaults to axis=0.


# Compute the median of each column
column_medians = df.median()
print("Column-wise medians:\n", column_medians)

Here,

  • Column A: The values are [2, 4, 6, 8, 10]. The median is 6.0.
  • Column B: The values are [1, 3, 5, 7, 9]. The median is 5.0.
  • Column C: The values are [4, 6, 1, 9, 3]. The median is 4.0.
pandas median

Row-wise Median

Alternatively, to compute the median for each row in a DataFrame, you need to specify the axis parameter as 1 in the median() method.


# Compute the median of each row
row_medians = df.median(axis=1)
print("Row-wise medians:\n", row_medians)

# Output:
# Row-wise medians:
# 0    2.0
# 1    4.0
# 2    5.0
# 3    8.0
# 4    9.0
# dtype: float64

Here,

  • Row 0: The values are [2, 1, 4]. The median is 2.0.
  • Row 1: The values are [4, 3, 6]. The median is 4.0.
  • Row 2: The values are [6, 5, 1]. The median is 5.0.
  • Row 3: The values are [8, 7, 9]. The median is 8.0.
  • Row 4: The values are [10, 9, 3]. The median is 9.0.

Handling Missing Values (skipna=True)

When calculating the median in a DataFrame and handling missing values, the skipna=True parameter (which is the default setting) ensures that NA/null values are excluded from the computation. This means that missing values are ignored, and the median is calculated based only on the available data.


import pandas as pd
import numpy as np

# Sample DataFrame with missing values
df = pd.DataFrame({
    'A': [2, None, 6, 8, 10],
    'B': [1, 3, None, 7, 9],
    'C': [4, 6, 1, 3, None]
})

# Compute the median of each column, excluding NA/null values
df2 = df.median(skipna=True)
print("Column-wise medians excluding NA values:\n", df2)

# Output:
# Column-wise medians excluding NA values:
# A    7.0
# B    5.0
# C    3.5
# dtype: float64

Here,

  • Column A: The values are [2, 6, 8, 10] (excluding None). The median is 7.0.
  • Column B: The values are [1, 3, 7, 9] (excluding None). The median is 5.0.
  • Column C: The values are [4, 6, 1, 3] (excluding None). The median is 3.5.

Handling Missing Values (skipna=False)

Similarly, when calculating the median and setting skipna=False, the presence of missing values (NA/null) will affect the result. If any NA/null values are present in the data, the result will be NA, as the median cannot be computed properly when there are missing values.


# Compute the median of each column, including NA/null values
df2 = df.median(skipna=False)
print("Column-wise medians including NA values:\n", df2)

# Output:
# Column-wise medians including NA values:
# A   NaN
# B   NaN
# C   NaN
# dtype: float64

Here,

  • Column A: Contains [2, None, 6, 8, 10]. Since there’s at least one None, the median result is NaN.
  • Column B: Contains [1, 3, None, 7, 9]. The median result is NaN due to the presence of None.
  • Column C: Contains [4, 6, 1, 3, None]. The median result is NaN because of the None value.

Median for Numeric Data Only

Finally, you can compute the median only for numeric columns in a DataFrame, use the numeric_only=True parameter with the median() method. This ensures that non-numeric columns are excluded from the calculation.


import pandas as pd

# Sample DataFrame with mixed data types
df = pd.DataFrame({
    'A': [2, 4, 'Pandas', 8, 10],
    'B': [1, 3, 5, 7, 9],
    'C': [4, 6, 1, 9, 3]
})

# Compute the median, including only numeric data
df2 = df.median(numeric_only=True)
print("Medians for numeric data only:\n", df2)

# Output:
# Medians for numeric data only:
# B    5.0
# C    4.0
# dtype: float64

Here,

  • Column A: Contains numeric and non-numeric data (Pandas is non-numeric). This column is excluded from the median calculation.
  • Column B: Contains numeric data [1, 3, 5, 7, 9]. The median is 5.0.
  • Column C: Contains numeric data [4, 6, 1, 9, 3]. The median is 4.0.

Median along a Specific Level (MultiIndex DataFrame)

Finally, to compute the median along a specific level in a MultiIndex DataFrame, you can use the level parameter in the median() method. This is useful when working with hierarchical data and you want to calculate the median for each group defined by a particular level of the MultiIndex.


import pandas as pd

# Sample MultiIndex DataFrame
df_multi = pd.DataFrame({
    'A': [2, 4, 6, 8],
    'B': [1, 3, 5, 7],
    'C': [4, 6, 1, 9]
}, index=pd.MultiIndex.from_tuples([('X', 1), ('X', 2), ('Y', 1), ('Y', 2)], names=['group', 'subgroup']))
print("Original MultiIndex DataFrame:\n", df_multi)

# Compute the median along the 'group' level
df2 = df_multi.median(level='group')
print("Median by group:\n", df2)

# Output:
# Original MultiIndex DataFrame:
#                  A  B  C
# group subgroup         
# X     1         2  1  4
#       2         4  3  6
# Y     1         6  5  1
#       2         8  7  9

# Median by group:
#         A  B  C
# group         
# X      3  2  5
# Y      7  6  5

Frequently Asked Questions on Pandas DataFrame median() Method

What does the median() method do in Pandas?

The median() method calculates the median of the numeric columns in a DataFrame. By default, it computes the median for each column (axis=0).

How does the median() method handle missing values?

By default, the median() method excludes missing values (NA/null) from the calculation with skipna=True. To include missing values and get NA as the result if any are present, set skipna=False.

Can I use the median() method with non-numeric data?

The median() method is designed to work with numeric data. Non-numeric columns are ignored unless explicitly included using numeric_only=False.

Is it possible to calculate the median for both rows and columns in one DataFrame?

The median() method can only compute the median for either rows or columns in a single call. You need to call the method twice, once with axis=0 and once with axis=1, to get both row-wise and column-wise medians.

How does the median() method handle mixed data types in a single column?

If a column contains mixed data types (e.g., numeric and strings), the method will ignore the entire column when calculating the median, unless numeric_only=False is set.

Conclusion

In conclusion, the Pandas median() method is an essential tool for statistical analysis within DataFrames. It provides flexibility in calculating medians across different axes and handling missing values effectively.

Happy Learning!!

References