DataFrame.mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame.
Key Points –
- Mean is the sum of all the values divided by the number of values
- Calculates mean on non numeric columns
- Provides a way to calculate mean on column axis.
- By default ignore NaN values and performs mean on index axis.
- Calculates the mean of numeric columns across the index axis (rows) by default (
axis=0
). - You can compute the mean along the column axis by setting
axis=1
. - The function issues a warning when non-numeric columns are included unless you use
numeric_only=True
.
Syntax of DataFrame.mean()
Following is the syntax of the DataFrame.mean() function.
# Syntax of DataFrame.mean()
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Following a parameters
axis
– Where to aply the mean rows/columns. For rows using index/0, for columns use column/1skipna
– Excludes all None/NaN from teh mean result. Default set to Truelevel
– Use with multiindex. Takes int or level name, default Nonenumeric_only
– Excludes all non numeric values. Considers only int, float & boolean. bool, default None**kwargs
– Additional keyword arguments to be passed to the function.
Let’s create a DataFrame from Dict and learn how to use the mean with an example.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",np.nan,"pandas","Java","Spark"],
'Fee' :[20000,25000,30000,22000,np.NaN],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies)
print(df)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000
# 1 NaN 25000.0 40days 2500
# 2 pandas 30000.0 35days 1500
# 3 Java 22000.0 60days 1200
# 4 Spark NaN 50days 3000
Pandas mean() Example
mean() method by default calculates mean for all numeric columns in pandas DataFrame and returns a result in Series. If you have non-numeric columns, this returns the below message along with the mean on numeric columns. By default axis=0 hence, it calculates mean on the index axis.
# Calculate mean
val = df.mean()
print(val)
# Output:
# Fee 24250.0
# Discount 1840.0
# dtype: float64
My DataFrame contains non-numeric columns hence I am getting FutureWarning. If you are using the latest future version you may not see this warning instead you may get an error.
Use numeric_only=True
to ignore this message or error.
# Calculate mean for all non numeric columns
val = df.mean(numeric_only=True)
print(val)
# Output:
# Fee 24250.0
# Discount 1840.0
# dtype: float64
Calculate Mean on Selected Column or Multiple Columns
If you wanted to select mean only on one column or multiple columns, you can do so by selecting columns using df[column_names_list] (DataFrame object notation).
# Mean() on selected columns
val = df[['Discount','Fee']].mean()
print(val)
Note that here it is not required to use numeric_only=True
as we are running mean()
on only numeric columns.
Ignore NaN from Mean
By default skipna=True
hence, all NaN values are ignored from the mean calculation. You can include NaN by setting skipna=False
. You can also drop all NaN rows from DataFrame using dropna() method.
# Skip NaN Values
val = df.mean(axis=0,numeric_only=True,skipna=True)
print(val)
Calculate Mean on Column axis
mean()
is calculated along the axis, by default it uses axis=0
meaning row axis if you wanted to calculate mean on column axis use axis=1
.
# Axis = 1 for column axis
val = df.mean(axis=1,numeric_only=True)
print(val)
# Output:
# 0 10500.0
# 1 13750.0
# 2 15750.0
# 3 11600.0
# 4 3000.0
# dtype: float64
FAQ on pandas.DataFrame.mean() Function
The DataFrame.mean()
function calculates the mean (average) of numeric values in the DataFrame, by default across the rows (axis=0
). It ignores non-numeric columns unless specified otherwise.
By default it ignores NaN
values (skipna=True
). You can include them in the calculation by setting skipna=False
.
By default, DataFrame.mean()
excludes non-numeric columns and returns a warning. To suppress the warning and exclude them, use the parameter numeric_only=True
.
When numeric_only=None
, True
is treated as 1 and False
as 0. If you do not want boolean columns considered, use numeric_only=True
.
Setting numeric_only=True
ensures that the mean is calculated only on numeric columns, excluding non-numeric data types such as strings or objects.
Conclusion
In this article, you have learned how to calculate mean()
on numeric columns by ignoring non-numeric columns, mean on multiple columns and also learned how to do mean() on column axis and by excluding and including NaN values.
Related Articles
- Pandas DataFrame first() Method
- Pandas DataFrame diff() Method
- Pandas DataFrame cumsum() Method
- Pandas DataFrame std() Method
- Pandas DataFrame min() Method
- Pandas DataFrame first() Method
- Pandas DataFrame abs() Method
- Pandas DataFrame all() Method
- Pandas DataFrame dot() Method
- Pandas Drop Columns from DataFrame
- Calculate Summary Statistics in Pandas
- Pandas Window Functions Explained
- Pandas Add Column based on Another Column
- How to Create Pandas Pivot Multiple Columns