• Post author:
  • Post category:Pandas
  • Post last modified:December 9, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Group Rows into List Using groupby()

You can group DataFrame rows into a list by using pandas.DataFrame.groupby() function on the column of interest, select the column you want as a list from group and then use Series.apply(list) to get the list for every group. In this article, I will explain how to group rows into the list using few examples.

Advertisements

Key Points –

  • The groupby() function is used to group DataFrame rows based on the values in one or more columns.
  • Use .apply(list) or .agg(list) after grouping to convert the grouped values into lists.
  • Use .reset_index() to flatten the grouped DataFrame and assign a new column name for the aggregated lists.
  • You can aggregate multiple columns into lists by specifying them in the .agg() function.
  • Use .agg() with a custom lambda function (lambda x: list(x)) for specific control over the aggregation process.
  • The groupby() method can be chained with other operations (e.g., filtering, sorting, etc.) for more complex data transformations.

Quick Examples Pandas Group Rows into List

Below are some of the good examples to group rows into a list in Pandas DataFrame.


# Quick examples pandas group rows into list 

# Group Rows on 'Courses' column
# Get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)

# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")

# Group rows into list
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})

# Group Rows into list on all columns
df2 = df.groupby("Courses").agg(list)

# Other way
df2 = df.groupby('Courses').agg(pd.Series.tolist)

Now, let’s create a DataFrame with a few rows and columns and execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
     'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
     'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
     'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  24000    30day      1000
1  PySpark  25000   40days      2300
2   Hadoop  25000   35days      1500
3   Python  24000   40days      1200
4   pandas  24000   60days      2500
5  PySpark  25000   50days      2100
6   Python  25000   55days      2000
7   pandas  24000   35days      2500

Pandas DataFrame.groupby() To Group Rows into List

By using DataFrame.gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).


# Group Rows on 'Courses' column and get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)
print(df2)

Note that here df.groupby('Courses')['Fee'] returns a Series object. and we have applied apply(list) on Series object to get you the right result. This example yields the below output.  


Courses
Hadoop            [25000]
PySpark    [25000, 25000]
Python     [24000, 25000]
Spark             [24000]
pandas     [24000, 24000]
Name: Fee, dtype: object

Assign Column Name to Gropby List Result

On groupby() list results use .reset_index(name="Course_Fee") to assign a column name to the list column.


# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")
print(df2)

Yields below output.


   Courses     Course_Fee
0   Hadoop         [25000]
1  PySpark  [25000, 25000]
2   Python  [24000, 25000]
3    Spark         [24000]
4   pandas  [24000, 24000]

Group Rows into List Using agg() & Lambda Function

Alternatively, you can also do group rows into list using df.groupby("Courses").agg({"Discount":lambda x:list(x)}) function. Use the groupby() method on the Courses and agg() method to apply the aggregation on every group of pandas.DataFrame.


# Group Rows into list
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df2)

Yields below output.


             Discount
Courses              
Hadoop         [1500]
PySpark  [2300, 2100]
Python   [1200, 2000]
Spark          [1000]
pandas   [2500, 2500]

Pandas Group Rows into List on All Columns

Let’s see how to group rows into the list for all DataFrame columns. This results in multiple List columns for every group.


# Group Rows into List on All columns
df2 = df.groupby("Courses").agg(list)
print(df2)

Yields below output.


                    Fee          Duration      Discount
Courses                                                
Hadoop          [25000]          [35days]        [1500]
PySpark  [25000, 25000]  [40days, 50days]  [2300, 2100]
Python   [24000, 25000]  [40days, 55days]  [1200, 2000]
Spark           [24000]           [30day]        [1000]
pandas   [24000, 24000]  [60days, 35days]  [2500, 2500]

You can also get the same results using


# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df2 = df.groupby('Courses').agg(pd.Series.tolist)
print(df2)

Complete Example For Reference


import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
     'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
     'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
     'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
              })
df = pd.DataFrame(technologies)
print(df)

# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list)
print(df)

# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Courses Fee")
print(df)

# Using lambda function on the DataFrame
df = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df)

# Using the list as an argument on the DataFrame
df = df.groupby("Courses").agg(list)
print(df)

# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df = df.groupby('Courses').agg(pd.Series.tolist)
print(df)

FAQ on Pandas Group Rows into List Using groupby()

What is the purpose of grouping rows into a list in Pandas?

Grouping rows into a list is useful when you want to consolidate data from multiple rows into a single row for each group, making it easier to analyze or process grouped data in a compact format.

How do I group rows into a list in pandas?

To group rows into a list in pandas, you can use the .groupby() method along with .agg(list).

Can I group multiple columns into lists?

You can group multiple columns into lists in pandas! Use the .agg(list) function for each column you want to aggregate into a list.

Can I customize the aggregation instead of using lists?

You can customize the aggregation when using pandas groupby(). Instead of aggregating into lists, you can apply custom functions or predefined aggregations like sum, mean, max, or any other operation.

How do I group rows into lists in specific order?

To group rows into lists in a specific order using pandas, you can use the groupby() method followed by the apply(list) function. You can also use sort_values() before grouping to ensure the desired order within each group.

Conclusion

In this article, you have learned how to group DataFrame rows into the list in the Pandas by using groupby() and using Series.apply(), Series.agg(). Also, you have learned to group rows into a list on all columns.

Happy Learning !!

References

Leave a Reply