• Post author:
  • Post category:Pandas
  • Post last modified:August 20, 2024
  • Reading time:18 mins read
You are currently viewing Pandas DataFrame max() Function

In pandas, you can find the maximum value in a DataFrame using the max() function. This function can be applied to either rows or columns.

Advertisements

In this article, I will explain the Pandas DataFrame max() function by using its syntax, parameters, and usage, and how to return a Series containing the maximum value for each column.

Key Points –

  • By default, max() calculates the maximum value along the columns (axis=0), returning a Series with the maximum values for each column.
  • By setting the axis parameter to 1, max() calculates the maximum value along the rows, returning a Series with the maximum values for each row.
  • The function skips NA values by default, but this behavior can be controlled using the skipna parameter to include or exclude NA values from the calculation.
  • When working with a MultiIndex DataFrame, the level parameter allows for the calculation of maximum values along a specific level of the index.
  • The numeric_only parameter (introduced in newer versions of pandas) enables filtering to include only numeric columns when calculating the maximum values.

Pandas DataFrame max() Introduction

Let’s know the syntax of the max() function.


# Syntax of DataFrame max()
DataFrame.max(axis=0, skipna=True, level=None, numeric_only=None, **kwargs)

Parameters of the DataFrame max()

Following are the parameters of the DataFrame max() function.

  • axis – {0, 1}, default 0. The axis along which to compute the maximum. 0 or index computes the maximum for each column, while 1 or columns computes the maximum for each row.
  • skipna – bool, default True. Whether to exclude NA/null values. If True, NA values are ignored. If False, the result will be NA if any NA values are present.
  • level – int or level name, default None. If the axis is a MultiIndex (hierarchical), this parameter specifies the level along which to compute the maximum.
  • numeric_only – bool, default None. Whether to include only numeric data types. If True, only numeric columns are considered. If None, the behavior depends on the pandas version and the data types present.
  • kwargs – Additional keyword arguments passed to other methods or functions.

Return Value

It returns a Series with the maximum values when applied to a DataFrame, and a single scalar value when applied to a Series.

Usage of Pandas DataFrame max() Function

The max() function in pandas is used to find the maximum values in a DataFrame or Series.

To run some examples of the pandas DataFrame max() function, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {
    'A': [4, 38, 8, 24, 15],
    'B': [52, 5, 49, 18, 31],
    'C': [9, 13, 84, 53, 22]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n",df)

Yields below output.

pandas max

Maximum Value in Each Column

Alternatively, to find the maximum value in each column of a pandas DataFrame, you can use the max() function. This function, when applied to a DataFrame without any additional parameters, returns a Series containing the maximum value from each column.


# Maximum value in each column
df2 = df.max()
print("Maximum value in each column:\n", df2)

Here,

  • The max() function is applied to the DataFrame df.
  • The function returns a Series where each element represents the maximum value in the corresponding column.
    • Column ‘A’ has a maximum value of 38.
    • Column ‘B’ has a maximum value of 52.
    • Column ‘C’ has a maximum value of 84.
pandas max

Maximum Value in Each Row

To find the maximum value in each row of a pandas DataFrame, you can use the max() function with the axis=1 parameter. This tells pandas to calculate the maximum value across all columns for each row.


# Maximum value in each row
df2 = df.max(axis=1)
print("Maximum value in each row:\n", df2)

# Output:
# Maximum value in each row:
# 0    52
# 1    38
# 2    84
# 3    53
# 4    31
# dtype: int64

Here,

  • The axis=1 parameter specifies that the maximum should be calculated across the columns (horizontally) for each row.
  • The function returns a Series where each element represents the maximum value in the corresponding row.
    • Row 0 has a maximum value of 52.
    • Row 1 has a maximum value of 38.
    • Row 2 has a maximum value of 84.
    • Row 3 has a maximum value of 53.
    • Row 4 has a maximum value of 31.

Maximum Value Across the Entire DataFrame

To find the maximum value across the entire DataFrame, you can use the values attribute combined with the max() function. This approach flattens the DataFrame into a single array and then computes the maximum value.


# Maximum value across the entire dataframe
df2 = df.values.max()
print("Overall maximum value in the DataFrame:", df2)

# Output:
# Overall maximum value in the DataFrame: 84

Maximum Value with NaN Values (Handling NA Values)

Similarly, when handling NaN (Not a Number) values in a pandas DataFrame or Series with the max() function.

Default Behavior (Skipping NaN Values)

By default, the max() function skips any NaN values in the calculation, returning the maximum of the non-NaN values. For instance, skipna=True (default), Ignores NaN values and returns the maximum of the remaining values.


import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
data = {
    'A': [4, np.nan, 8, 24, 15],
    'B': [52, 5, 49, np.nan, 31],
    'C': [9, 13, 84, 53, np.nan]
}

df = pd.DataFrame(data)

# Maximum value in each column, skipping NaN by default
df2 = df.max()
print("Maximum value in each column (skipping NaN):\n", df2)

# Output:
# Maximum value in each column (skipping NaN):
# A    24.0
# B    52.0
# C    84.0
# dtype: float64

Including NaN Values

If you want the max() function to consider NaN values, you can set the skipna parameter to False. When skipna=False, the result will be NaN for any column or row containing NaN values. For instance, skipna=False, Includes NaN in the calculation, resulting in NaN if any are present in the series or DataFrame being evaluated.


# Maximum value in each column, considering NaN values
df2= df.max(skipna=False)
print("Maximum value in each column (considering NaN):\n", df2)

# Output:
# Maximum value in each column (considering NaN):
# A   NaN
# B   NaN
# C   NaN
# dtype: float64

Using Level Parameter

Finally, the level parameter in the max() function is used when working with pandas DataFrames or Series that have a MultiIndex (hierarchical index). This parameter allows you to compute the maximum value over a particular level of the MultiIndex, collapsing the DataFrame or Series along that level.


import pandas as pd
import numpy as np

# Creating a MultiIndex
index = pd.MultiIndex.from_tuples([
    ('A', 1), ('A', 2), ('B', 1), ('B', 2)
], names=['Group', 'Subgroup'])

# Creating a DataFrame with the MultiIndex
data = {
    'Values1': [10, 20, 15, 30],
    'Values2': [40, 25, 35, 45]
}
df = pd.DataFrame(data, index=index)

# Finding the maximum value across 'Subgroup' level (level 1)
df2 = df.max(level='Group')
print("Maximum value by Group:\n", df2)

# Output:
# Maximum value by Group:
#         Values1  Values2
# Group                  
# A           20       40
# B           30       45

Frequently Asked Questions on Pandas DataFrame max() Function

What does the max() function do in pandas?

The max() function returns the maximum value from the DataFrame or Series. When applied to a DataFrame, it computes the maximum value for each column by default.

How do I find the maximum value in each column of a DataFrame?

You can use df.max() to find the maximum value in each column of the DataFrame df. By default, this function operates along the columns (axis=0).

How do I find the maximum value in each row?

To find the maximum value in each row, use df.max(axis=1). This calculates the maximum value across columns for each row.

How does the skipna parameter work?

The skipna parameter controls whether NaN values are ignored in the calculation. By default, skipna=True, meaning NaN values are skipped. If you set skipna=False, NaN values are considered, and if any NaN is present, the result will be NaN.

How can I use the level parameter in the max() function?

The level parameter is used with MultiIndex DataFrames to compute the maximum value along a specified level of the index. For example, df.max(level='Group') calculates the maximum value for each group while collapsing other levels.

Conclusion

In conclusion, the max() method in pandas provides a versatile way to identify the maximum values in a DataFrame. By default, it returns a Series with the maximum value of each column. It can also be adjusted to compute maximum values along rows, handle NA values, and work with MultiIndex DataFrames. Understanding and leveraging the parameters such as axis, skipna, and level allows for more precise and flexible data analysis.

Happy Learning!!

Reference