• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:19 mins read
You are currently viewing Pandas groupby() and count() with Examples

Pandas groupby().count() is used to group columns and count the number of occurrences of each unique value in a specific column or combination of columns.

Advertisements

In this article, I will explain how to use groupby() and count() aggregate together with examples. groupby() function is used to collect identical data into groups and apply aggregation functions to the GroupBy object to summarize and analyze the grouped data. Common aggregation functions include sum, mean, count, min, max, and more.

1. Quick Examples of groupby() and count() of DataFrame

If you are in a hurry, below are some quick examples of how to group by columns and get the count for each group from DataFrame.


# Below are the quick examples

# Example 1: Using groupby() and count()
df2 = df.groupby(['Courses'])['Courses'].count()

# Example 2: Using GroupBy & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
      .count()

# Example 3: Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
      .size()

# Example 4: Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration'])
      .size().groupby(level=0).max() 

# Example 5: Use size().reset_index() method
df2 = df.groupby(['Courses','Duration'])
      .size().reset_index(name='counts')

# Example 6: Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])
      .['Fee'].agg('count').reset_index()

# Example 7: Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration'])
      .Courses.transform('count')

# Example 8: Use DataFrame.groupby() and Size() 
print(df.groupby(['Discount','Duration'])
     .size() 
     .sort_values(ascending=False) 
     .reset_index(name='count') 
     .drop_duplicates(subset='Duration'))

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create a pandas DataFrame.
import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
    'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
                })
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print("Create DataFrame:\n", df)

Yields below output.

Pandas groupby count

2. Group the Rows by Column Name and Get Count

Use the Pandas df.groupby() function to group the rows by column and use the count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on the Courses column and calculates how many times each value is present.


# Using groupby() and count() 
# to get the count of each group
df2 = df.groupby(['Courses'])['Courses']
        .count()
print("Get count of each group:\n", df2)

Yields below output.

Pandas groupby count

If you want to get the count of each group greater than 1, you can use the filter() function and get each group that contains the count number as 2.


df2 = df['Courses'].groupby(df['Courses']).filter(lambda x: len(x) > 1).value_counts()
print("Get count of each group:\n", df2)

Yields below output.


# Output:
Get count of each group:
 Spark     2
Hadoop    2
Python    2
Name: Courses, dtype: int64

3. By Multiple Columns

You can groupby rows by multiple columns using the groupby() method. This allows you to apply a groupby on multiple columns and calculate a count for each combination group using the count() method. For example, df.groupby(['Courses','Duration'])['Fee'].count() does group on Courses and Duration column.


# Using groupby() & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
        .count()
print(df2)

Yields below output. This counts the number of non-null values in the ‘Fee’ column for each combination of ‘Courses’ and ‘Duration’.


# Output:
Courses  Duration
Hadoop   35days      2
Pandas   60days      1
PySpark  50days      1
Python   40days      1
         50days      1
Spark    30days      1
         55days      1
Name: Fee, dtype: int64

As you can see from the above, a new Pandas Series has been created where the index is unique combinations of ‘Courses’ and ‘Duration,’ and the values represent the count of non-null ‘Fee’ values for each of those combinations.

4. Pandas Groupby Count Sort Descending

Sometimes you would be required to perform a sort (ascending or descending order) after performing group and count. You can achieve this using the below example.

Note that by default groupby sorts results by group key hence, it will take additional time, if you have a performance issue and don’t want to sort the group by the result, you can turn this off by using the sort=False param.


# Sorting after groupby() & count()
# Sorting group keys on descending order
groupedDF = df.groupby('Courses',sort=False).count()
sortedDF=groupedDF.sort_values('Courses', ascending=False)['Fee']
print(sortedDF)

Yields below output.


# Output:
Courses
Spark      2
Python     2
PySpark    1
Pandas     1
Hadoop     2
Name: Fee, dtype: int64

5. Using groupby() with size()

Alternatively, you can also use size() to get the rows count for each group. You can use df.groupby(['Courses','Duration']).size() to get a total number of elements for each group Courses and Duration. columns


# Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee']
        .size()
print(df2)

Use df.groupby(['Courses','Duration']).size().groupby(level=1).max() to specify which level you want as output. Note that the level starts from zero.


# Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration'])
        .size().groupby(level=0).max() 
print(df2)

Yields below output.


# Output:
Courses
Hadoop     2
Pandas     1
PySpark    1
Python     1
Spark      1
dtype: int64

Then use size().reset_index(name='counts') to assign a name to the count column.


# Use size().reset_index() method
df2 = df.groupby(['Courses','Duration'])
        .size().reset_index(name='counts')
print(df2)

Yields below output.


# Output:
   Courses Duration  counts
0   Hadoop   35days       2
1   Pandas   60days       1
2  PySpark   50days       1
3   Python   40days       1
4   Python   50days       1
5    Spark   30days       1
6    Spark   55days       1

6. Pandas groupby() and using agg(‘count’)

Alternatively, you can also get the group count by using agg() or aggregate() function and passing the aggregate count function as a param. reset_index() function is used to set the index on DataFrame. By using this approach you can compute multiple aggregations.


# Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])
 ['Fee'].agg('count').reset_index()
print(df2)

Yields below output.


# Output:
   Courses Duration  Fee
0   Hadoop   35days    2
1   Pandas   60days    1
2  PySpark   50days    1
3   Python   40days    1
4   Python   50days    1
5    Spark   30days    1
6    Spark   55days    1

7. Pandas Groupby Count as New Column

You can use df.groupby(['Courses','Fee']).Courses.transform('count') to add a new column containing the groups counts into the DataFrame.


# Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration'])
        .Courses.transform('count')
print(df2)

Yields below output.


# Output:
0    1
1    1
2    2
3    1
4    1
5    2
6    1
7    1
Name: Courses, dtype: int64

8. Other Examples

Now let’s see how to sort rows from the result of pandas groupby and drop duplicate rows from pandas DataFrame.


# Use DataFrame.groupby() and Size() 
print(df.groupby(['Discount','Duration']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count') 
   .drop_duplicates(subset='Duration'))

Yields below output.


# Output:
Name: Courses, dtype: int64
   Discount Duration  count
0      1000   30days      1
1      1000   35days      1
2      1200   40days      1
4      1400   55days      1
5      1600   50days      1
7      2500   60days      1

9. Complete Examples of groupby and count


# Create a pandas DataFrame.
import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
    'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
                })
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print(df)

# Using groupby() and count()
df2 = df.groupby(['Courses'])['Courses'].count()
print(df2)

# Using GroupBy & count() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee'].count()
print(df2)

# Using GroupBy & size() on multiple column
df2 = df.groupby(['Courses','Duration'])['Fee'].size()
print(df2)

# Using DataFrame.size() and max()
df2 = df.groupby(['Courses','Duration']).size().groupby(level=0).max() 
print(df2)

# Use size().reset_index() method
df2 = df.groupby(['Courses','Duration']).size().reset_index(name='counts')
print(df2)

# Using pandas DataFrame.reset_index()
df2 = df.groupby(['Courses','Duration'])['Fee'].agg('count').reset_index()
print(df2)

# Using DataFrame.transform()
df2 = df.groupby(['Courses','Duration']).Courses.transform('count')
print(df2)

# Use DataFrame.groupby() and Size() 
print(df.groupby(['Discount','Duration']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count') 
   .drop_duplicates(subset='Duration'))

Frequently Asked Questions of Pandas groupby() count

How to use groupby() with count() in pandas?

You can use the groupby() function with the count() function of Pandas to count the number of occurrences of each group.

What does count() do in pandas?

The count() function is used to count the number of non-null values in each column of a DataFrame or Series.

What is groupby() in Pandas, and why is it useful?

Pandas DataFrame.groupby() function is used to collect identical data into groups, and perform operations on these grouped data, such as aggregation, counting, or applying custom functions.

How do I use groupby() to group data by a specific column?

You can use the groupby() function to group data in a DataFrame by a specific column. For example, first, create a DataFrame df with two columns, 'Courses' and 'Fee'. To group the data by the ‘Courses’ column, you can use the groupby() function and pass the column name ‘Courses’ as an argument. It will return a GroupBy object that you can use for various operations, such as aggregation or further data analysis.

Can I group by multiple columns simultaneously?

Yes, you can group by multiple columns by passing a list of column names to the groupby() function

How do I count the number of occurrences of each group in a groupby() object?

You can use the count() function on a groupby() object to count the number of occurrences of each group.

Conclusion

In this article, you have learned how to group single and multiple columns and get the row counts for each group from Pandas DataFrame using df.groupby(), size(), count() and DataFrame.transform() methods with examples.

Happy Learning !!

Leave a Reply

This Post Has One Comment

  1. Anonymous

    this is good, but it would be nice if you had covered a basic idea of

    course.count(students) > 10
    meaning courses which are subscribed by more than 10 students