Site icon Spark By {Examples}

pandas.DataFrame.mean() Examples

pandas mean

DataFrame.mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame.

pandas mean() Key Points

  1. Mean is the sum of all the values divided by the number of values
  2. Calculates mean on non numeric columns
  3. By default ignore NaN values and performs mean on index axis.
  4. Provides a way to calculate mean on column axis.

1. DataFrame.mean() Syntax

Following is the syntax of the DataFrame.mean() function.


# Syntax of DataFrame.mean() 
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Following a parameters

Let’s create a DataFrame from Dict and learn how to use the mean with an example.


import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,np.NaN],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies)
print(df)

# Output:
#  Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days      1000
# 1     NaN  25000.0   40days      2500
# 2  pandas  30000.0   35days      1500
# 3    Java  22000.0   60days      1200
# 4   Spark      NaN   50days      3000

2. pandas mean() Example

mean() method by default calculates mean for all numeric columns in pandas DataFrame and returns a result in Series. If you have non-numeric columns, this returns the below message along with the mean on numeric columns. By default axis=0 hence, it calculates mean on the index axis.


# Calculate mean 
val = df.mean()
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

My DataFrame contains non-numeric columns hence I am getting FutureWarning. If you are using the latest future version you may not see this warning instead you may get an error.

Use numeric_only=True to ignore this message or error.


# Calculate mean for all non numeric columns
val = df.mean(numeric_only=True)
print(val)

# Output:
# Fee         24250.0
# Discount     1840.0
# dtype: float64

3. Calculate Mean on Selected Column or Multiple Columns

If you wanted to select mean only on one column or multiple columns, you can do so by selecting columns using df[column_names_list] (DataFrame object notation).


# Mean() on selected columns
val = df[['Discount','Fee']].mean()
print(val)

Note that here it is not required to use numeric_only=True as we are running mean() on only numeric columns.

4. Ignore NaN from Mean

By default skipna=True hence, all NaN values are ignored from the mean calculation. You can include NaN by setting skipna=False. You can also drop all NaN rows from DataFrame using dropna() method.


# Skip NaN Values
val = df.mean(axis=0,numeric_only=True,skipna=True)
print(val)

5. Calculate Mean on Column axis

mean() is calculated along the axis, by default it uses axis=0 meaning row axis if you wanted to calculate mean on column axis use axis = 1.


# Axis = 1 for column axis
val = df.mean(axis=1,numeric_only=True)
print(val)

# Output:
# 0    10500.0
# 1    13750.0
# 2    15750.0
# 3    11600.0
# 4     3000.0
# dtype: float64

Conclusion

In this article, you have learned how to calculate mean() on numeric columns by ignoring non-numeric columns, mean on multiple columns and also learned how to do mean() on column axis and by excluding and including NaN values.

References

Exit mobile version