• Post author:
  • Post category:Pandas
  • Post last modified:September 26, 2024
  • Reading time:16 mins read
You are currently viewing How to Get Column Average or Mean in Pandas DataFrame

To get column average or mean from pandas DataFrame use either mean() or describe() method. The mean() method is used to return the mean of the values along the specified axis. If you apply this method on a series object, it returns a scalar value, which is the mean value of all the observations in the pandas DataFrame.

Advertisements

Related: Get all column names from Pandas DataFrame.

In this article, I will explain how to get column average or mean from pandas DataFrame with examples.

Key Points –

  • The DataFrame.mean() method calculates the mean for all numerical columns by default.
  • You can calculate the mean of a specific column by selecting it first and then applying .mean().
  • The default behavior of .mean(axis=0) is to compute the mean for each column (along the vertical axis).
  • By default, .mean() ignores NaN values (skipna=True); use skipna=False to include them in the calculation.
  • The .describe() method provides a quick summary that includes the mean for each numerical column.
  • You can compute the mean of filtered rows by applying conditional logic before calculating the mean.

Quick Examples of Get Column Average Or Mean In DataFrame

Below are some quick examples of how to get column average or mean in pandas DataFrame.


# Quick examples of get column average or mean

# Example 1: Using DataFrame.mean() method 
# To get column average
df2 = df["Fee"].mean()

# Example 2: Using DataFrame.mean() 
# To get entire column mean
df2 = df.mean()

# Example 3: Get multiple columns mean
# Using DataFrame.mean()
df2 = df[["Fee","Discount"]].mean()

# Example 4: Average of each column 
# Using DataFrame.mean()
df2 = df.mean(axis=0)

# Example 5: Find the mean including NaN values 
# Using DataFrame.mean()
df2 = df.mean(axis = 0, skipna = False)

# Example 6: Using DataFrame.describe() method
df2 = df.describe()

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses, Fee, Duration and Discount.


# Get Column Average Or Mean In DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas",None],
    'Fee' :[20000,25000,22000,None,30000],
    'Duration':['30days','40days','35days','None','50days'],
    'Discount':[1000,2300,1200,2000,None]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)

Yields below output.

pandas get column mean

Pandas Get Column Mean

DataFrame.mean() method is used to get the mean value of a particular column from pandas DataFrame, you can use the df["Fee"].mean() function for a specific column only.


# Using DataFrame.mean() method to get column average
df2 = df["Fee"].mean()
print(" Get the mean of the 'Fee' column:\n", df2)

Yields below output.

pandas get column mean

Pandas Get Mean for All Columns

If you want to compute the mean for all columns or all numeric columns in the DataFrame, you can simply apply the mean() function over the whole DataFrame.

Let’s apply the mean() function to the entire DataFrame and get the mean for all numeric columns in the given DataFrame.


# Using DataFrame.mean() to get entire column mean
df2 = df.mean()
print("Get mean of entire DataFrame:\n", df2)

# Output:
# Get mean of entire DataFrame:
# Fee         24250.0
# Discount     1625.0
# dtype: float64

Alternatively, you can calculate the mean of all numeric columns in the DataFrame to use pandas.Series.mean() function. For that, simply pass a list of DataFrame columns(from which we want to get mean values) into this function. It will return the mean values of passed columns of DataFrame.


# Get multiple columns mean using DataFrame.mean()
df2 = df[["Fee","Discount"]].mean()
print("Get mean of specified columns:\n", df2)

# Output:
# Get mean of specified columns:
# Fee         24250.0
# Discount     1625.0
# dtype: float64

Similarly, you can use df.mean(axis=0) like this way to calculate the column-wise mean of the Datarame.


# Average of each column using DataFrame.mean()
df2 = df.mean(axis=0)
print("Get column-wise mean:\n", df2)

# Output:
# Get column-wise mean:
# Fee         24250.0
# Discount     1625.0
# dtype: float64

Get the Column Mean Including NaN Values

By default mean() ignores/excludes NaN/null values while calculating mean or average, you can consider these values by using skipna=False param.


# Find the mean ignoring NaN values using DataFrame.mean()
df2 = df.mean(axis = 0, skipna = False)
print(df2)

I will leave it to you to execute this in your environment.

Using DataFrame.describe() Method

You can also use DataFrame.describe() to create the output of complete statistics of the data in DataFrame.


# Using DataFrame.describe() method
df2 = df.describe()
print(df2)

Yields below output.


# Output:
               Fee     Discount
count      4.00000     4.000000
mean   24250.00000  1625.000000
std     4349.32945   623.832242
min    20000.00000  1000.000000
25%    21500.00000  1150.000000
50%    23500.00000  1600.000000
75%    26250.00000  2075.000000
max    30000.00000  2300.000000

Complete Example For Get Column Average or Mean


# Example For Get Column Average or Mean
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas",None],
    'Fee' :[20000,25000,22000,None,30000],
    'Duration':['30days','40days','35days','None','50days'],
    'Discount':[1000,2300,1200,2000,None]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)

# Using DataFrame.mean() method to get column average
df2 = df["Fee"].mean()
print(" Get the mean of the 'Fee' column:\n", df2)

# Using DataFrame.mean() to get entire column mean
df2 = df.mean()
print("Get mean of entire DataFrame:\n", df2)

# Get multiple columns mean using DataFrame.mean()
df2 = df[["Fee","Discount"]].mean()
print("Get mean of specified columns:\n", df2)

# Average of each column using DataFrame.mean()
df2 = df.mean(axis=0)
print("Get column-wise mean:\n", df2)

# Find the mean ignoring NaN values using DataFrame.mean()
df2 = df.mean(axis = 0, skipna = False)
print(df2)

# Using DataFrame.describe() method
df2 = df.describe()
print(df2)

Frequently Asked Questions of Get Column Average or Mean

How do I calculate the mean of a specific column in a pandas DataFrame?

To calculate the mean of a specific column in a pandas DataFrame, you can use the df.mean() function. For example, create a DataFrame named df and you want to calculate the mean of the “Fee” column, you can use df["Fee"].mean().

Can I calculate the mean for multiple columns at once?

Yes, you can calculate the mean for multiple columns at once simply use .mean() function to the whole DataFrame or a subset of it. For example, df.mean() will give you the mean for all numeric columns in the DataFrame.

What if my DataFrame contains non-numeric data?

If you have the DataFrame which contains non-numeric data and you want to calculate the mean value for that DataFrame.You should filter the DataFrame for containing only the numeric columns or handle non-numeric data separately. Attempting to calculate the mean on non-numeric data will result in an error.

What happens if there are missing values (NaN) in the column?

By default, the .mean() function in pandas ignores/excludes NaN/null values while calculating mean or average. If you want to exclude missing values, you can use the skipna=False parameter, like df['column_name'].mean(skipna=False).

How can I calculate the mean for each column in a DataFrame?

If you want to calculate the mean for each column in a DataFrame, you can use the .mean() function along with the specified axis. For column-wise means, use df.mean(axis=0).

Conclusion

In this article, you have learned how to get column average or mean from pandas DataFrame using DataFrame.mean() and DataFrame.describe() method with examples. Using mean() you can get mean from single or selected columns and by index.

Happy Learning !!

References

Leave a Reply