DataFrame.mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame.
pandas mean() Key Points
- Mean is the sum of all the values divided by the number of values
- Calculates mean on non numeric columns
- By default ignore NaN values and performs mean on index axis.
- Provides a way to calculate mean on column axis.
1. DataFrame.mean() Syntax
Following is the syntax of the DataFrame.mean() function.
# Syntax of DataFrame.mean()
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Following a parameters
axis
– Where to aply the mean rows/columns. For rows using index/0, for columns use column/1skipna
– Excludes all None/NaN from teh mean result. Default set to Truelevel
– Use with multiindex. Takes int or level name, default Nonenumeric_only
– Excludes all non numeric values. Considers only int, float & boolean. bool, default None**kwargs
– Additional keyword arguments to be passed to the function.
Let’s create a DataFrame from Dict and learn how to use the mean with an example.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",np.nan,"pandas","Java","Spark"],
'Fee' :[20000,25000,30000,22000,np.NaN],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies)
print(df)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000
# 1 NaN 25000.0 40days 2500
# 2 pandas 30000.0 35days 1500
# 3 Java 22000.0 60days 1200
# 4 Spark NaN 50days 3000
2. pandas mean() Example
mean() method by default calculates mean for all numeric columns in pandas DataFrame and returns a result in Series. If you have non-numeric columns, this returns the below message along with the mean on numeric columns. By default axis=0 hence, it calculates mean on the index axis.
# Calculate mean
val = df.mean()
print(val)
# Output:
# Fee 24250.0
# Discount 1840.0
# dtype: float64
My DataFrame contains non-numeric columns hence I am getting FutureWarning. If you are using the latest future version you may not see this warning instead you may get an error.
Use numeric_only=True to ignore this message or error.
# Calculate mean for all non numeric columns
val = df.mean(numeric_only=True)
print(val)
# Output:
# Fee 24250.0
# Discount 1840.0
# dtype: float64
3. Calculate Mean on Selected Column or Multiple Columns
If you wanted to select mean only on one column or multiple columns, you can do so by selecting columns using df[column_names_list] (DataFrame object notation).
# Mean() on selected columns
val = df[['Discount','Fee']].mean()
print(val)
Note that here it is not required to use numeric_only=True
as we are running mean()
on only numeric columns.
4. Ignore NaN from Mean
By default skipna=True
hence, all NaN values are ignored from the mean calculation. You can include NaN by setting skipna=False. You can also drop all NaN rows from DataFrame using dropna() method.
# Skip NaN Values
val = df.mean(axis=0,numeric_only=True,skipna=True)
print(val)
5. Calculate Mean on Column axis
mean()
is calculated along the axis, by default it uses axis=0
meaning row axis if you wanted to calculate mean on column axis use axis = 1
.
# Axis = 1 for column axis
val = df.mean(axis=1,numeric_only=True)
print(val)
# Output:
# 0 10500.0
# 1 13750.0
# 2 15750.0
# 3 11600.0
# 4 3000.0
# dtype: float64
Conclusion
In this article, you have learned how to calculate mean() on numeric columns by ignoring non-numeric columns, mean on multiple columns and also learned how to do mean() on column axis and by excluding and including NaN values.
Related Articles
- Pandas – Create DataFrame From Multiple Series
- Pandas Drop Columns from DataFrame
- Pandas Add Column based on Another Column
- Pandas Groupby Transform
- Calculate Summary Statistics in Pandas
- Pandas Window Functions Explained
- How to Create Pandas Pivot Multiple Columns