• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:7 mins read
You are currently viewing pandas.DataFrame.mean() Examples

DataFrame.mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame.

pandas mean() Key Points

  1. Mean is the sum of all the values divided by the number of values
  2. Calculates mean on non numeric columns
  3. By default ignore NaN values and performs mean on index axis.
  4. Provides a way to calculate mean on column axis.

1. DataFrame.mean() Syntax

Following is the syntax of the DataFrame.mean() function.


# Syntax of DataFrame.mean() 
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Following a parameters

  • axis – Where to aply the mean rows/columns. For rows using index/0, for columns use column/1
  • skipna – Excludes all None/NaN from teh mean result. Default set to True
  • level – Use with multiindex. Takes int or level name, default None
  • numeric_only – Excludes all non numeric values. Considers only int, float & boolean. bool, default None
  • **kwargs – Additional keyword arguments to be passed to the function.

Let’s create a DataFrame from Dict and learn how to use the mean with an example.


import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,np.NaN],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies)
print(df)

# Output:
#  Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days      1000
# 1     NaN  25000.0   40days      2500
# 2  pandas  30000.0   35days      1500
# 3    Java  22000.0   60days      1200
# 4   Spark      NaN   50days      3000

2. pandas mean() Example

mean() method by default calculates mean for all numeric columns in pandas DataFrame and returns a result in Series. If you have non-numeric columns, this returns the below message along with the mean on numeric columns. By default axis=0 hence, it calculates mean on the index axis.


# Calculate mean 
val = df.mean()
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

My DataFrame contains non-numeric columns hence I am getting FutureWarning. If you are using the latest future version you may not see this warning instead you may get an error.

Use numeric_only=True to ignore this message or error.


# Calculate mean for all non numeric columns
val = df.mean(numeric_only=True)
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

3. Calculate Mean on Selected Column or Multiple Columns

If you wanted to select mean only on one column or multiple columns, you can do so by selecting columns using df[column_names_list] (DataFrame object notation).


# Mean() on selected columns
val = df[['Discount','Fee']].mean()
print(val)

Note that here it is not required to use numeric_only=True as we are running mean() on only numeric columns.

4. Ignore NaN from Mean

By default skipna=True hence, all NaN values are ignored from the mean calculation. You can include NaN by setting skipna=False. You can also drop all NaN rows from DataFrame using dropna() method.


# Skip NaN Values
val = df.mean(axis=0,numeric_only=True,skipna=True)
print(val)

5. Calculate Mean on Column axis

mean() is calculated along the axis, by default it uses axis=0 meaning row axis if you wanted to calculate mean on column axis use axis = 1.


# Axis = 1 for column axis
val = df.mean(axis=1,numeric_only=True)
print(val)

# Output:
# 0    10500.0
# 1    13750.0
# 2    15750.0
# 3    11600.0
# 4     3000.0
# dtype: float64

Conclusion

In this article, you have learned how to calculate mean() on numeric columns by ignoring non-numeric columns, mean on multiple columns and also learned how to do mean() on column axis and by excluding and including NaN values.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium