Pandas groupby().count() is used to group columns and count the number of occurrences of each unique value in a specific column or combination of columns.
In this article, I will explain how to use groupby() and count() aggregate together with examples. groupby()
function is used to collect identical data into groups and apply aggregation functions to the GroupBy object to summarize and analyze the grouped data. Common aggregation functions include sum
, mean
, count
, min
, max
, and more.
1. Quick Examples of groupby() and count() of DataFrame
If you are in a hurry, below are some quick examples of how to group by columns and get the count for each group from DataFrame.
# Below are the quick examples
# Example 1: Using groupby() and count()
df2 = df.groupby(['Courses'])['Courses'].count()
# Example 2: Using GroupBy & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
.count()
# Example 3: Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
.size()
# Example 4: Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration'])
.size().groupby(level=0).max()
# Example 5: Use size().reset_index() method
df2 = df.groupby(['Courses','Duration'])
.size().reset_index(name='counts')
# Example 6: Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])
.['Fee'].agg('count').reset_index()
# Example 7: Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration'])
.Courses.transform('count')
# Example 8: Use DataFrame.groupby() and Size()
print(df.groupby(['Discount','Duration'])
.size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='Duration'))
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print("Create DataFrame:\n", df)
Yields below output.
2. Group the Rows by Column Name and Get Count
Use the Pandas df.groupby() function to group the rows by column and use the count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on the Courses column and calculates how many times each value is present.
# Using groupby() and count()
# to get the count of each group
df2 = df.groupby(['Courses'])['Courses']
.count()
print("Get count of each group:\n", df2)
Yields below output.
If you want to get the count of each group greater than 1, you can use the filter() function and get each group that contains the count number as 2.
df2 = df['Courses'].groupby(df['Courses']).filter(lambda x: len(x) > 1).value_counts()
print("Get count of each group:\n", df2)
Yields below output.
# Output:
Get count of each group:
Spark 2
Hadoop 2
Python 2
Name: Courses, dtype: int64
3. By Multiple Columns
You can groupby rows by multiple columns using the groupby() method. This allows you to apply a groupby on multiple columns and calculate a count for each combination group using the count() method. For example, df.groupby(['Courses','Duration'])['Fee'].count()
does group on Courses
and Duration
column.
# Using groupby() & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
.count()
print(df2)
Yields below output. This counts the number of non-null values in the ‘Fee’ column for each combination of ‘Courses’ and ‘Duration’.
# Output:
Courses Duration
Hadoop 35days 2
Pandas 60days 1
PySpark 50days 1
Python 40days 1
50days 1
Spark 30days 1
55days 1
Name: Fee, dtype: int64
As you can see from the above, a new Pandas Series has been created where the index is unique combinations of ‘Courses’ and ‘Duration,’ and the values represent the count of non-null ‘Fee’ values for each of those combinations.
4. Pandas Groupby Count Sort Descending
Sometimes you would be required to perform a sort (ascending or descending order) after performing group and count. You can achieve this using the below example.
Note that by default groupby sorts results by group key hence, it will take additional time, if you have a performance issue and don’t want to sort the group by the result, you can turn this off by using the sort=False
param.
# Sorting after groupby() & count()
# Sorting group keys on descending order
groupedDF = df.groupby('Courses',sort=False).count()
sortedDF=groupedDF.sort_values('Courses', ascending=False)['Fee']
print(sortedDF)
Yields below output.
# Output:
Courses
Spark 2
Python 2
PySpark 1
Pandas 1
Hadoop 2
Name: Fee, dtype: int64
5. Using groupby() with size()
Alternatively, you can also use size() to get the rows count for each group. You can use df.groupby(['Courses','Duration']).size()
to get a total number of elements for each group Courses
and Duration
. columns
# Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
.size()
print(df2)
Use df.groupby(['Courses','Duration']).size().groupby(level=1).max()
to specify which level you want as output. Note that the level starts from zero.
# Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration'])
.size().groupby(level=0).max()
print(df2)
Yields below output.
# Output:
Courses
Hadoop 2
Pandas 1
PySpark 1
Python 1
Spark 1
dtype: int64
Then use size().reset_index(name='counts')
to assign a name to the count column.
# Use size().reset_index() method
df2 = df.groupby(['Courses','Duration'])
.size().reset_index(name='counts')
print(df2)
Yields below output.
# Output:
Courses Duration counts
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
6. Pandas groupby() and using agg(‘count’)
Alternatively, you can also get the group count by using agg()
or aggregate() function and passing the aggregate count function as a param. reset_index() function is used to set the index on DataFrame. By using this approach you can compute multiple aggregations.
# Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])
['Fee'].agg('count').reset_index()
print(df2)
Yields below output.
# Output:
Courses Duration Fee
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
7. Pandas Groupby Count as New Column
You can use df.groupby(['Courses','Fee']).Courses.transform('count')
to add a new column containing the groups counts
into the DataFrame.
# Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration'])
.Courses.transform('count')
print(df2)
Yields below output.
# Output:
0 1
1 1
2 2
3 1
4 1
5 2
6 1
7 1
Name: Courses, dtype: int64
8. Other Examples
Now let’s see how to sort rows from the result of pandas groupby and drop duplicate rows from pandas DataFrame.
# Use DataFrame.groupby() and Size()
print(df.groupby(['Discount','Duration']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='Duration'))
Yields below output.
# Output:
Name: Courses, dtype: int64
Discount Duration count
0 1000 30days 1
1 1000 35days 1
2 1200 40days 1
4 1400 55days 1
5 1600 50days 1
7 2500 60days 1
9. Complete Examples of groupby and count
# Create a pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print(df)
# Using groupby() and count()
df2 = df.groupby(['Courses'])['Courses'].count()
print(df2)
# Using GroupBy & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee'].count()
print(df2)
# Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee'].size()
print(df2)
# Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration']).size().groupby(level=0).max()
print(df2)
# Use size().reset_index() method
df2 = df.groupby(['Courses','Duration']).size().reset_index(name='counts')
print(df2)
# Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])['Fee'].agg('count').reset_index()
print(df2)
# Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration']).Courses.transform('count')
print(df2)
# Use DataFrame.groupby() and Size()
print(df.groupby(['Discount','Duration']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='Duration'))
Frequently Asked Questions of Pandas groupby() count
You can use the groupby()
function with the count()
function of Pandas to count the number of occurrences of each group.
The count()
function is used to count the number of non-null values in each column of a DataFrame or Series.
Pandas DataFrame.groupby()
function is used to collect identical data into groups, and perform operations on these grouped data, such as aggregation, counting, or applying custom functions.
groupby()
to group data by a specific column? You can use the groupby()
function to group data in a DataFrame by a specific column. For example, first, create a DataFrame df
with two columns, 'Courses'
and 'Fee'
. To group the data by the ‘Courses’ column, you can use the groupby()
function and pass the column name ‘Courses’ as an argument. It will return a GroupBy
object that you can use for various operations, such as aggregation or further data analysis.
Yes, you can group by multiple columns by passing a list of column names to the groupby()
function
groupby()
object? You can use the count()
function on a groupby()
object to count the number of occurrences of each group.
Conclusion
In this article, you have learned how to group single and multiple columns and get the row counts for each group from Pandas DataFrame using df.groupby(
), size()
, count()
and DataFrame.transform()
methods with examples.
Happy Learning !!
Related Articles
- Pandas groupby() transform
- Pandas groupby() aggregate explained
- Pandas groupby() multiple columns explained
- Pandas groupby() sort within groups
- Pandas groupby() with example
- Pandas groupby() and sum() with examples.
- Convert groupby() output from series to DatatFrame
- Pandas Group Rows into List Using groupby()
- Pandas DataFrame count() Function
- How to groupby() index in Pandas DataFrame
- Pandas Percentage Total With Groupby
this is good, but it would be nice if you had covered a basic idea of
course.count(students) > 10
meaning courses which are subscribed by more than 10 students