In pandas, you can find the maximum value in a DataFrame using the max()
function. This function can be applied to either rows or columns.
In this article, I will explain the Pandas DataFrame max()
function by using its syntax, parameters, and usage, and how to return a Series containing the maximum value for each column.
Key Points –
- By default,
max()
calculates the maximum value along the columns (axis=0), returning a Series with the maximum values for each column. - By setting the
axis
parameter to 1,max()
calculates the maximum value along the rows, returning a Series with the maximum values for each row. - The function skips NA values by default, but this behavior can be controlled using the
skipna
parameter to include or exclude NA values from the calculation. - When working with a MultiIndex DataFrame, the
level
parameter allows for the calculation of maximum values along a specific level of the index. - The
numeric_only
parameter (introduced in newer versions of pandas) enables filtering to include only numeric columns when calculating the maximum values.
Pandas DataFrame max() Introduction
Let’s know the syntax of the max() function.
# Syntax of DataFrame max()
DataFrame.max(axis=0, skipna=True, level=None, numeric_only=None, **kwargs)
Parameters of the DataFrame max()
Following are the parameters of the DataFrame max() function.
axis
– {0, 1}, default 0. The axis along which to compute the maximum.0
orindex
computes the maximum for each column, while1
orcolumns
computes the maximum for each row.skipna
– bool, default True. Whether to exclude NA/null values. If True, NA values are ignored. If False, the result will be NA if any NA values are present.level
– int or level name, default None. If the axis is a MultiIndex (hierarchical), this parameter specifies the level along which to compute the maximum.numeric_only
– bool, default None. Whether to include only numeric data types. If True, only numeric columns are considered. If None, the behavior depends on the pandas version and the data types present.kwargs
– Additional keyword arguments passed to other methods or functions.
Return Value
It returns a Series with the maximum values when applied to a DataFrame, and a single scalar value when applied to a Series.
Usage of Pandas DataFrame max() Function
The max()
function in pandas is used to find the maximum values in a DataFrame or Series.
To run some examples of the pandas DataFrame max() function, let’s create a Pandas DataFrame using data from a dictionary.
import pandas as pd
import numpy as np
# Creating a sample DataFrame
data = {
'A': [4, 38, 8, 24, 15],
'B': [52, 5, 49, 18, 31],
'C': [9, 13, 84, 53, 22]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n",df)
Yields below output.
Maximum Value in Each Column
Alternatively, to find the maximum value in each column of a pandas DataFrame, you can use the max()
function. This function, when applied to a DataFrame without any additional parameters, returns a Series containing the maximum value from each column.
# Maximum value in each column
df2 = df.max()
print("Maximum value in each column:\n", df2)
Here,
- The
max()
function is applied to the DataFramedf
. - The function returns a Series where each element represents the maximum value in the corresponding column.
- Column ‘A’ has a maximum value of
38
. - Column ‘B’ has a maximum value of
52
. - Column ‘C’ has a maximum value of
84
.
- Column ‘A’ has a maximum value of
Maximum Value in Each Row
To find the maximum value in each row of a pandas DataFrame, you can use the max()
function with the axis=1
parameter. This tells pandas to calculate the maximum value across all columns for each row.
# Maximum value in each row
df2 = df.max(axis=1)
print("Maximum value in each row:\n", df2)
# Output:
# Maximum value in each row:
# 0 52
# 1 38
# 2 84
# 3 53
# 4 31
# dtype: int64
Here,
- The
axis=1
parameter specifies that the maximum should be calculated across the columns (horizontally) for each row. - The function returns a Series where each element represents the maximum value in the corresponding row.
- Row 0 has a maximum value of
52
. - Row 1 has a maximum value of
38
. - Row 2 has a maximum value of
84
. - Row 3 has a maximum value of
53
. - Row 4 has a maximum value of
31
.
- Row 0 has a maximum value of
Maximum Value Across the Entire DataFrame
To find the maximum value across the entire DataFrame, you can use the values attribute combined with the max()
function. This approach flattens the DataFrame into a single array and then computes the maximum value.
# Maximum value across the entire dataframe
df2 = df.values.max()
print("Overall maximum value in the DataFrame:", df2)
# Output:
# Overall maximum value in the DataFrame: 84
Maximum Value with NaN Values (Handling NA Values)
Similarly, when handling NaN (Not a Number) values in a pandas DataFrame or Series with the max()
function.
Default Behavior (Skipping NaN Values)
By default, the max()
function skips any NaN
values in the calculation, returning the maximum of the non-NaN
values. For instance, skipna=True
(default), Ignores NaN
values and returns the maximum of the remaining values.
import pandas as pd
import numpy as np
# Creating a DataFrame with NaN values
data = {
'A': [4, np.nan, 8, 24, 15],
'B': [52, 5, 49, np.nan, 31],
'C': [9, 13, 84, 53, np.nan]
}
df = pd.DataFrame(data)
# Maximum value in each column, skipping NaN by default
df2 = df.max()
print("Maximum value in each column (skipping NaN):\n", df2)
# Output:
# Maximum value in each column (skipping NaN):
# A 24.0
# B 52.0
# C 84.0
# dtype: float64
Including NaN Values
If you want the max()
function to consider NaN
values, you can set the skipna
parameter to False
. When skipna=False
, the result will be NaN
for any column or row containing NaN
values. For instance, skipna=False
, Includes NaN in the calculation, resulting in NaN
if any are present in the series or DataFrame being evaluated.
# Maximum value in each column, considering NaN values
df2= df.max(skipna=False)
print("Maximum value in each column (considering NaN):\n", df2)
# Output:
# Maximum value in each column (considering NaN):
# A NaN
# B NaN
# C NaN
# dtype: float64
Using Level Parameter
Finally, the level
parameter in the max()
function is used when working with pandas DataFrames or Series that have a MultiIndex (hierarchical index). This parameter allows you to compute the maximum value over a particular level of the MultiIndex, collapsing the DataFrame or Series along that level.
import pandas as pd
import numpy as np
# Creating a MultiIndex
index = pd.MultiIndex.from_tuples([
('A', 1), ('A', 2), ('B', 1), ('B', 2)
], names=['Group', 'Subgroup'])
# Creating a DataFrame with the MultiIndex
data = {
'Values1': [10, 20, 15, 30],
'Values2': [40, 25, 35, 45]
}
df = pd.DataFrame(data, index=index)
# Finding the maximum value across 'Subgroup' level (level 1)
df2 = df.max(level='Group')
print("Maximum value by Group:\n", df2)
# Output:
# Maximum value by Group:
# Values1 Values2
# Group
# A 20 40
# B 30 45
Frequently Asked Questions on Pandas DataFrame max() Function
The max()
function returns the maximum value from the DataFrame or Series. When applied to a DataFrame, it computes the maximum value for each column by default.
You can use df.max()
to find the maximum value in each column of the DataFrame df
. By default, this function operates along the columns (axis=0
).
To find the maximum value in each row, use df.max(axis=1)
. This calculates the maximum value across columns for each row.
The skipna
parameter controls whether NaN
values are ignored in the calculation. By default, skipna=True
, meaning NaN
values are skipped. If you set skipna=False
, NaN
values are considered, and if any NaN
is present, the result will be NaN
.
The level
parameter is used with MultiIndex DataFrames to compute the maximum value along a specified level of the index. For example, df.max(level='Group')
calculates the maximum value for each group while collapsing other levels.
Conclusion
In conclusion, the max()
method in pandas provides a versatile way to identify the maximum values in a DataFrame. By default, it returns a Series with the maximum value of each column. It can also be adjusted to compute maximum values along rows, handle NA values, and work with MultiIndex DataFrames. Understanding and leveraging the parameters such as axis
, skipna
, and level
allows for more precise and flexible data analysis.
Happy Learning!!
Related Articles
- Pandas DataFrame ffill() Method
- Pandas DataFrame mode() Method
- Pandas DataFrame mad() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame cov() Method
- Pandas DataFrame corrwith() Method
- Pandas DataFrame product() Method
- Pandas DataFrame rank() Method
- Pandas DataFrame mask() Method
- Pandas DataFrame diff() Method