• Post author:
  • Post category:Pandas
  • Post last modified:December 5, 2024
  • Reading time:12 mins read
You are currently viewing pandas.DataFrame.mean() Examples

DataFrame.mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame.

Advertisements

Key Points –

  • Mean is the sum of all the values divided by the number of values
  • Calculates mean on non numeric columns
  • Provides a way to calculate mean on column axis.
  • By default ignore NaN values and performs mean on index axis.
  • Calculates the mean of numeric columns across the index axis (rows) by default (axis=0).
  • You can compute the mean along the column axis by setting axis=1.
  • The function issues a warning when non-numeric columns are included unless you use numeric_only=True.

Syntax of DataFrame.mean()

Following is the syntax of the DataFrame.mean() function.


# Syntax of DataFrame.mean() 
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Following a parameters

  • axis – Where to aply the mean rows/columns. For rows using index/0, for columns use column/1
  • skipna – Excludes all None/NaN from teh mean result. Default set to True
  • level – Use with multiindex. Takes int or level name, default None
  • numeric_only – Excludes all non numeric values. Considers only int, float & boolean. bool, default None
  • **kwargs – Additional keyword arguments to be passed to the function.

Let’s create a DataFrame from Dict and learn how to use the mean with an example.


import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,np.NaN],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies)
print(df)

# Output:
#  Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days      1000
# 1     NaN  25000.0   40days      2500
# 2  pandas  30000.0   35days      1500
# 3    Java  22000.0   60days      1200
# 4   Spark      NaN   50days      3000

Pandas mean() Example

mean() method by default calculates mean for all numeric columns in pandas DataFrame and returns a result in Series. If you have non-numeric columns, this returns the below message along with the mean on numeric columns. By default axis=0 hence, it calculates mean on the index axis.


# Calculate mean 
val = df.mean()
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

My DataFrame contains non-numeric columns hence I am getting FutureWarning. If you are using the latest future version you may not see this warning instead you may get an error.

Use numeric_only=True to ignore this message or error.


# Calculate mean for all non numeric columns
val = df.mean(numeric_only=True)
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

Calculate Mean on Selected Column or Multiple Columns

If you wanted to select mean only on one column or multiple columns, you can do so by selecting columns using df[column_names_list] (DataFrame object notation).


# Mean() on selected columns
val = df[['Discount','Fee']].mean()
print(val)

Note that here it is not required to use numeric_only=True as we are running mean() on only numeric columns.

Ignore NaN from Mean

By default skipna=True hence, all NaN values are ignored from the mean calculation. You can include NaN by setting skipna=False. You can also drop all NaN rows from DataFrame using dropna() method.


# Skip NaN Values
val = df.mean(axis=0,numeric_only=True,skipna=True)
print(val)

Calculate Mean on Column axis

mean() is calculated along the axis, by default it uses axis=0 meaning row axis if you wanted to calculate mean on column axis use axis=1.


# Axis = 1 for column axis
val = df.mean(axis=1,numeric_only=True)
print(val)

# Output:
# 0    10500.0
# 1    13750.0
# 2    15750.0
# 3    11600.0
# 4     3000.0
# dtype: float64

FAQ on pandas.DataFrame.mean() Function

What does pandas.DataFrame.mean() do?

The DataFrame.mean() function calculates the mean (average) of numeric values in the DataFrame, by default across the rows (axis=0). It ignores non-numeric columns unless specified otherwise.

Does DataFrame.mean() ignore NaN values by default?

By default it ignores NaN values (skipna=True). You can include them in the calculation by setting skipna=False.

How does DataFrame.mean() handle non-numeric columns?

By default, DataFrame.mean() excludes non-numeric columns and returns a warning. To suppress the warning and exclude them, use the parameter numeric_only=True.

How does DataFrame.mean() handle boolean columns?

When numeric_only=None, True is treated as 1 and False as 0. If you do not want boolean columns considered, use numeric_only=True.

What does numeric_only=True do in mean()?

Setting numeric_only=True ensures that the mean is calculated only on numeric columns, excluding non-numeric data types such as strings or objects.

Conclusion

In this article, you have learned how to calculate mean() on numeric columns by ignoring non-numeric columns, mean on multiple columns and also learned how to do mean() on column axis and by excluding and including NaN values.

References