# Pandas Get Statistics For Each Group?

How to get statistics for each group (such as count, mean, max, min e.tc) using pandas GroupBy? You can achieve this by using groupby() method and agg() function.

In this article, you can learn `pandas.DataFrame.groupby()` to group the single column, two, or multiple columns and get the `size()``count()`  for each group combination. `groupBy()` function is used to collect the identical data into groups and perform aggregate functions like size/count on the grouped data.

In Pandas, you can use `groupby()` with the combination of `count()``size()`, `mean()`, `min()`, `max()` and more methods.

## 1. Quick Examples of Pandas Get Statistics For Each Group

If you are in a hurry, below are some quick examples.

``````
# Below are some quick examples.
# Use DataFrame.size()
df2=df.groupby(['Courses','Duration'])
df2=df.groupby(['Courses', 'Duration']).size().reset_index(name='counts'))

# Pandas groupby() with agg() Method.
df2 = df.groupby(['Courses', 'Duration']).agg(['mean', 'count'])

# # Pandas Get Statistics Using groupby().describe()
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()

# Pandas DataFrame.groupby() and describe() function.
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()[['count', 'mean']]

# Get statistics by DataFrame.value_counts.
df2=df.value_counts(subset=['Courses', 'Duration'])

# Using groupby() and agg() function.
df2 = df.groupby(['Courses','Duration']).agg(['mean', 'count'])
df.columns = [ ' '.join(str(i) for i in col) for col in df.columns]
df.reset_index(inplace=True)
``````

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names `Courses``Fee``Duration`, and `Discount`.

``````
# Create a DataFrame.
import pandas as pd
technologies   = ({
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000,25000],
'Duration':['30days','50days','55days', '40days','55days','35days','30days','40days','40days'],
'Discount':[1000,2300,1000,1200,2500,1200,1400,1000,1200]
})
df = pd.DataFrame(technologies)
print(df)
``````

Yields below output.

``````
Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
3   Python  24000   40days      1200
6    Spark  25000   30days      1400
7   Python  22000   40days      1000
8    Spark  25000   40days      1200
``````

## 2. Pandas groupby() with size() to get Column Counts

Use DataFrame.groupby() to group the rows and use `size()` to get the count on each group. The `size` property is used to get an int representing the number of elements in this object. For the Series object, it returns the number of rows. For the DataFrame object, it returns the number of rows times the number of columns (rows * columns).

``````
# Use DataFrame.size()
df2=df.groupby(['Courses','Duration'])
df2=df.groupby(['Courses', 'Duration']).size().reset_index(name='counts')
print(df2)
``````

Yields below output. Note the `groupby()` method returns the `pandas.core.groupby.generic.DataFrameGroupBy`. `reset_index(name='counts')` is used to set the label name for the size column.

``````
Courses Duration  counts
2  PySpark   50days       1
3   Python   40days       2
4    Spark   30days       2
5    Spark   40days       1
``````

## 3. Pandas groupby() with agg() Method

Alternatively, you can also use `groupby()` method and use `agg()` function to the size/count. The `agg()` method allows you to apply a function or a list of function names to be executed along with one of the axis of the DataFrame, axis by default set to 0, which is the index (row) axis.

To get the counts and mean for each group combination use `count` and `mean` aggregations.

``````
# Pandas .groupby() and using agg() Method.
df2 = df.groupby(['Courses', 'Duration']).agg(['mean', 'count'])
print(df2)
``````

Yields below output.

``````
Fee       Discount
mean count     mean count
Courses Duration
Hadoop  35days    25000.0     1   1200.0     1
55days    24500.0     2   1750.0     2
PySpark 50days    25000.0     1   2300.0     1
Python  40days    23000.0     2   1100.0     2
Spark   30days    23500.0     2   1200.0     2
40days    25000.0     1   1200.0     1
``````

## 4. Pandas Get Statistics Using groupby().describe()

In this section, we can get statistics using `groupby().describe()` function. The describe() function is used as a summarization tool that quickly displays statistics for any `variable` or` group` it is applied to. The `describe()` output varies depending on whether you apply it to a` numeric` or `character` column.

``````
# Pandas Get Statistics Using groupby().describe()
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()
print(df2)
``````

Yields below output.

``````

count    mean          std  ...     50%     75%     max
Courses Duration                              ...
Hadoop  35days      1.0  1200.0          NaN  ...  1200.0  1200.0  1200.0
55days      2.0  1750.0  1060.660172  ...  1750.0  2125.0  2500.0
PySpark 50days      1.0  2300.0          NaN  ...  2300.0  2300.0  2300.0
Python  40days      2.0  1100.0   141.421356  ...  1100.0  1150.0  1200.0
Spark   30days      2.0  1200.0   282.842712  ...  1200.0  1300.0  1400.0
40days      1.0  1200.0          NaN  ...  1200.0  1200.0  1200.0
``````

You can just use the built-in function `count()` and` mean()` follow by the `DataFrame.groupby()` and `describe()` function.

``````
# Pandas DataFrame.groupby() and describe() function.
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()[['count', 'mean']]
print(df2)
``````

Yields below output.

``````
count    mean
Courses Duration
55days      2.0  1750.0
PySpark 50days      1.0  2300.0
Python  40days      2.0  1100.0
Spark   30days      2.0  1200.0
40days      1.0  1200.0
``````

## 5. Get Statistics of Each Group by DataFrame.value_counts

To get Pandas statistics of each group by `DataFrame.value_counts`. The` value_counts()` function is used to get a Series containing counts of unique values.

``````
# Get statistics by DataFrame.value_counts.
df2=df.value_counts(subset=['Courses', 'Duration'])
print(df2)
``````

Yields below output.

``````
Courses  Duration
Python   40days      2
Spark    30days      2
PySpark  50days      1
Spark    40days      1
dtype: int64
``````

## 6. Other Examples

In this section, To get multiple stats, collapse the index, and retain column names. For example-

``````
# Using groupby() and agg() function.
df2 = df.groupby(['Courses','Duration']).agg(['mean', 'count'])
df.columns = [ ' '.join(str(i) for i in col) for col in df.columns]
df.reset_index(inplace=True)
print(df2)
``````

Yields below output.

``````
Courses  Duration
Python   40days      2
Spark    30days      2
PySpark  50days      1
Spark    40days      1
dtype: int64
``````

## 7. Complete Examples to Get Statistics of Each Group

``````
# Create a DataFrame.
import pandas as pd
technologies   = ({
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000,25000],
'Duration':['30days','50days','55days', '40days','55days','35days','30days','40days','40days'],
'Discount':[1000,2300,1000,1200,2500,1200,1400,1000,1200]
})
df = pd.DataFrame(technologies)
print(df)

# Use DataFrame.size()
df2=df.groupby(['Courses','Duration'])
df2=df.groupby(['Courses', 'Duration']).size().reset_index(name='counts'))

# Pandas groupby() with agg() Method.
df2 = df.groupby(['Courses', 'Duration']).agg(['mean', 'count'])

# # Pandas Get Statistics Using groupby().describe()
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()

# Pandas DataFrame.groupby() and describe() function.
df2=df.groupby(['Courses', 'Duration'])['Discount'].describe()[['count', 'mean']]

# Get statistics by DataFrame.value_counts.
df2=df.value_counts(subset=['Courses', 'Duration'])

# Using groupby() and agg() function.
df2 = df.groupby(['Courses','Duration']).agg(['mean', 'count'])
df.columns = [ ' '.join(str(i) for i in col) for col in df.columns]
df.reset_index(inplace=True)
``````

## Conclusion

In this article, you have learned how to groupby() single and multiple columns and get counts, size, max, min, mean for each group from pandas DataFrame. Also, learn how to get the stats using describe() build-in function.