• Post author:
  • Post category:Pandas
  • Post last modified:November 13, 2024
  • Reading time:12 mins read
You are currently viewing How to GroupBy Index in Pandas?

How to perform groupby index in pandas? Pass the index name of the DataFrame as a parameter to the groupby() function to group rows on an index. DataFrame.groupby() function takes a string or list as a parameter to specify the group columns or index. You can also use the combination of index and columns when using the list.

Advertisements

In this article, I will explain how to perform groupby on a single index, multiple indexes, or a combination of column and index with examples.

Key Points –

  • Use the groupby() function to group data based on index values.
  • You can group by a single index level or multiple levels in a MultiIndex.
  • After grouping, aggregation functions like sum()mean()count(), etc., can be applied to summarize data.
  • groupby() can handle NaN values in index levels, grouping them separately.
  • The grouped result retains the index, but the DataFrame structure may change depending on the operation applied.

Quick Examples of Performing GroupBy on Index

Following are quick examples of how to perform group by on index field.


# Quick examples of performing groupby on index

# Example 1: Create Dataframe
df = pd.DataFrame(technologies)
# Set index to DataFrame
df.set_index(['Courses','Fee'], inplace=True)

# Example 2: Groupby index
result = df.groupby('Courses').sum()

# Example 3: Groupby multiple index
result = df.groupby(['Courses','Fee']).sum()

# Example 4: Groupby column & index
result = df.groupby(['Courses','Duration']).sum()

Let’s create a pandas DataFrame from the Dict object and explore the above examples.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
    'Fee' :[20000,25000,26000,22000,25000,20000,35000],
    'Duration':['30day','40days','35days','40days','60days','60days','70days'],
    'Discount':[1000,2300,1200,2500,2000,2000,3000]
              }

df = pd.DataFrame(technologies)
df.set_index(['Courses','Fee'], inplace=True)
print("Create DataFrame:\n", df)

Yields below output. As you notice in the above example, I have used DataFrame.set_index() to set the multiple columns as Index. I will use these two indexes to group rows.

pandas groupby index

GroupBy Index Example

DataFrame.gorupby() accepts string or list of column or index names to perform group in pandas DataFrame. The index name is required to perform, If you don’t have it, set the name to index by using DataFrame.index.name = 'index-name'.


# Groupby Index
result = df.groupby('Courses').sum()
print("Get the grouped data beased on index:\n", result)

Yields below output.

pandas groupby index

GroupBy Multiple Index

Now let’s see how to group by multiple index fields at the same time, to do so pass all index names as a list. The below example groups rows by Courses and Fee index.


# Groupby multiple index
result = df.groupby(['Courses','Fee']).sum()
print("Get the grouped data beased on multiple indexes:\n", result)

Yields below output.


# Output:
# Get the grouped data beased on multiple indexes:
               Discount
Courses Fee            
Hadoop  26000      1200
PySpark 25000      4300
Python  22000      2500
Spark   20000      3000
        35000      3000

Use Both Column and Index on GroupBy

Sometimes you may also be required to do group by column and index at the same time, the good thing about groupby() function is it accepts both at the same time.


# Groupby Column & index
result = df.groupby(['Courses','Duration']).sum()
print("Get the grouped data beased on index and column:\n", result)

Note that my DataFrame doesn’t have the right date to the group on Courses and Duration columns hence it listed all rows from DataFrame. But I hope you understand the concept of how to do it.


# Output:
# Get the grouped data beased on index and column:
                  Discount
Courses Duration          
Hadoop  35days        1200
PySpark 40days        2300
        60days        2000
Python  40days        2500
Spark   30day         1000
        60days        2000
        70days        3000

Complete Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
    'Fee' :[20000,25000,26000,22000,25000,20000,35000],
    'Duration':['30day','40days','35days','40days','60days','60days','70days'],
    'Discount':[1000,2300,1200,2500,2000,2000,3000]
              }

df = pd.DataFrame(technologies)
df.set_index(['Courses','Fee'], inplace=True)
print(df)

# Groupby Index
result = df.groupby('Courses').sum()
print(result)

# Groupby multiple index
result = df.groupby(['Courses','Fee']).sum()
print(result)

# Groupby Column & index
result = df.groupby(['Courses','Duration']).sum()
print(result)

Frequently Asked Questions of Pandas Groupby Index

How do I group data by the index in Pandas?

You can use the groupby() function to group the Pandas by an index for that you need to pass the index column or list of column indexes as the argument of the groupby() function.

How can I group by multiple levels of the index?

You can use the groupby() function to group the Pandas by multiple levels of the index for that you need to pass the list of column indexes as the argument of the groupby() function.

How to aggregate data after grouping by the index?

You can apply aggregation functions like sum(), mean(), etc., after grouping the data by using the groupby() Function.

How can I apply multiple aggregation functions to grouped data?

You can use the agg() method and provide dictionary mapping columns to perform multiple aggregation functions to grouped data.

How to reset the index after grouping?

You can use the reset_index() method to convert the grouped data back to a DataFrame with a default integer index.

Conclusion

In this article, you have learned how to perform group by on a single index, multiple indexes, and a combination of column and index using Pandas groupby() function with multiple examples.

Reference