In Pandas, the median()
method is used to compute the median of values for each column or row in a DataFrame, excluding NA/null values by default.
In this article, I will explain the Pandas DataFrame median()
method by using its syntax, parameters, and usage, and how to return the median of the values along the specified axis.
Key Points –
- Computes the median of the values along the specified axis of the DataFrame.
- Accepts
axis=0
(default) to compute the median for each column andaxis=1
to compute the median for each row. - By default,
skipna=True
excludes NA/null values from the computation. Settingskipna=False
includes them. - By default, the method operates on numeric data types, but you can control this behavior with the
numeric_only
parameter. - The
level
parameter allows for median calculation along a particular level of a MultiIndex, collapsing the DataFrame accordingly.
Syntax of Pandas DataFrame median() Method
Let’s know the syntax of the pandas DataFrame.median() method.
# Syntax of DataFrame.median() method
DataFrame.median(axis=0, skipna=True, level=None, numeric_only=None, **kwargs)
Parameters of the median()
Following are the parameters of the median() method
axis
– {index (0), columns (1)}. The axis along which to compute the median. The default isaxis=0
, which means it will calculate the median for each column.skipna
– bool, default True. Exclude NA/null values when computing the median. IfFalse
, NA/null values are included in the computation.level
– int or level name, default None. If the axis is a MultiIndex (hierarchical), compute the median along a particular level, collapsing into a Series.numeric_only
– bool, default None. Include only float, int, and boolean data. IfNone
, will attempt to use everything, then use only numeric data.**kwargs
– Additional keyword arguments to be passed to the method.
Return Value
It returns the method returns a Series object containing the median values for the requested axis.
Usage of Pandas DataFrame median() Method
The median()
method in Pandas is used to calculate the median (the middle value) of numeric data along a specified axis of a DataFrame.
To run some examples of pandas DataFrame median() method, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
, B
and C
.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 9],
'C': [4, 6, 1, 9, 3]
})
print("Original DataFrame:\n",df)
Yields below output.
Column-wise Median (Default)
To compute the column-wise median of a DataFrame (which is the default behavior of the median()
method), you can simply call the median()
method without specifying the axis
parameter, as it defaults to axis=0
.
# Compute the median of each column
column_medians = df.median()
print("Column-wise medians:\n", column_medians)
Here,
Column A
: The values are[2, 4, 6, 8, 10]
. The median is6.0
.Column B
: The values are[1, 3, 5, 7, 9]
. The median is5.0
.Column C
: The values are[4, 6, 1, 9, 3]
. The median is4.0
.
Row-wise Median
Alternatively, to compute the median for each row in a DataFrame, you need to specify the axis
parameter as 1
in the median()
method.
# Compute the median of each row
row_medians = df.median(axis=1)
print("Row-wise medians:\n", row_medians)
# Output:
# Row-wise medians:
# 0 2.0
# 1 4.0
# 2 5.0
# 3 8.0
# 4 9.0
# dtype: float64
Here,
Row 0
: The values are[2, 1, 4]
. The median is2.0
.Row 1
: The values are[4, 3, 6]
. The median is4.0
.Row 2
: The values are[6, 5, 1]
. The median is5.0
.Row 3
: The values are[8, 7, 9]
. The median is8.0
.Row 4
: The values are[10, 9, 3]
. The median is9.0
.
Handling Missing Values (skipna=True)
When calculating the median in a DataFrame and handling missing values, the skipna=True
parameter (which is the default setting) ensures that NA/null values are excluded from the computation. This means that missing values are ignored, and the median is calculated based only on the available data.
import pandas as pd
import numpy as np
# Sample DataFrame with missing values
df = pd.DataFrame({
'A': [2, None, 6, 8, 10],
'B': [1, 3, None, 7, 9],
'C': [4, 6, 1, 3, None]
})
# Compute the median of each column, excluding NA/null values
df2 = df.median(skipna=True)
print("Column-wise medians excluding NA values:\n", df2)
# Output:
# Column-wise medians excluding NA values:
# A 7.0
# B 5.0
# C 3.5
# dtype: float64
Here,
Column A
: The values are[2, 6, 8, 10]
(excludingNone
). The median is7.0
.Column B
: The values are[1, 3, 7, 9]
(excludingNone
). The median is5.0
.Column C
: The values are[4, 6, 1, 3]
(excludingNone
). The median is3.5
.
Handling Missing Values (skipna=False)
Similarly, when calculating the median and setting skipna=False
, the presence of missing values (NA/null) will affect the result. If any NA/null values are present in the data, the result will be NA, as the median cannot be computed properly when there are missing values.
# Compute the median of each column, including NA/null values
df2 = df.median(skipna=False)
print("Column-wise medians including NA values:\n", df2)
# Output:
# Column-wise medians including NA values:
# A NaN
# B NaN
# C NaN
# dtype: float64
Here,
Column A
: Contains[2, None, 6, 8, 10]
. Since there’s at least oneNone
, the median result isNaN
.Column B
: Contains[1, 3, None, 7, 9]
. The median result isNaN
due to the presence ofNone
.Column C
: Contains[4, 6, 1, 3, None]
. The median result isNaN
because of theNone
value.
Median for Numeric Data Only
Finally, you can compute the median only for numeric columns in a DataFrame, use the numeric_only=True
parameter with the median()
method. This ensures that non-numeric columns are excluded from the calculation.
import pandas as pd
# Sample DataFrame with mixed data types
df = pd.DataFrame({
'A': [2, 4, 'Pandas', 8, 10],
'B': [1, 3, 5, 7, 9],
'C': [4, 6, 1, 9, 3]
})
# Compute the median, including only numeric data
df2 = df.median(numeric_only=True)
print("Medians for numeric data only:\n", df2)
# Output:
# Medians for numeric data only:
# B 5.0
# C 4.0
# dtype: float64
Here,
Column A
: Contains numeric and non-numeric data (Pandas
is non-numeric). This column is excluded from the median calculation.Column B
: Contains numeric data[1, 3, 5, 7, 9]
. The median is5.0
.Column C
: Contains numeric data[4, 6, 1, 9, 3]
. The median is4.0
.
Median along a Specific Level (MultiIndex DataFrame)
Finally, to compute the median along a specific level in a MultiIndex DataFrame, you can use the level
parameter in the median()
method. This is useful when working with hierarchical data and you want to calculate the median for each group defined by a particular level of the MultiIndex.
import pandas as pd
# Sample MultiIndex DataFrame
df_multi = pd.DataFrame({
'A': [2, 4, 6, 8],
'B': [1, 3, 5, 7],
'C': [4, 6, 1, 9]
}, index=pd.MultiIndex.from_tuples([('X', 1), ('X', 2), ('Y', 1), ('Y', 2)], names=['group', 'subgroup']))
print("Original MultiIndex DataFrame:\n", df_multi)
# Compute the median along the 'group' level
df2 = df_multi.median(level='group')
print("Median by group:\n", df2)
# Output:
# Original MultiIndex DataFrame:
# A B C
# group subgroup
# X 1 2 1 4
# 2 4 3 6
# Y 1 6 5 1
# 2 8 7 9
# Median by group:
# A B C
# group
# X 3 2 5
# Y 7 6 5
Frequently Asked Questions on Pandas DataFrame median() Method
The median()
method calculates the median of the numeric columns in a DataFrame. By default, it computes the median for each column (axis=0).
By default, the median()
method excludes missing values (NA/null) from the calculation with skipna=True
. To include missing values and get NA as the result if any are present, set skipna=False
.
The median()
method is designed to work with numeric data. Non-numeric columns are ignored unless explicitly included using numeric_only=False
.
The median()
method can only compute the median for either rows or columns in a single call. You need to call the method twice, once with axis=0
and once with axis=1
, to get both row-wise and column-wise medians.
If a column contains mixed data types (e.g., numeric and strings), the method will ignore the entire column when calculating the median, unless numeric_only=False
is set.
Conclusion
In conclusion, the Pandas median()
method is an essential tool for statistical analysis within DataFrames. It provides flexibility in calculating medians across different axes and handling missing values effectively.
Happy Learning!!
Related Articles
- Pandas DataFrame clip() Method
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame describe() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame nunique() Method