You can group DataFrame rows into a list by using pandas.DataFrame.groupby()
function on the column of interest, select the column you want as a list from group and then use Series.apply(list)
to get the list for every group. In this article, I will explain how to group rows into the list using few examples.
Key Points –
- The
groupby()
function is used to group DataFrame rows based on the values in one or more columns. - Use
.apply(list)
or.agg(list)
after grouping to convert the grouped values into lists. - Use
.reset_index()
to flatten the grouped DataFrame and assign a new column name for the aggregated lists. - You can aggregate multiple columns into lists by specifying them in the
.agg()
function. - Use
.agg()
with a custom lambda function (lambda x: list(x)
) for specific control over the aggregation process. - The
groupby()
method can be chained with other operations (e.g., filtering, sorting, etc.) for more complex data transformations.
Quick Examples Pandas Group Rows into List
Below are some of the good examples to group rows into a list in Pandas DataFrame.
# Quick examples pandas group rows into list
# Group Rows on 'Courses' column
# Get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)
# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")
# Group rows into list
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
# Group Rows into list on all columns
df2 = df.groupby("Courses").agg(list)
# Other way
df2 = df.groupby('Courses').agg(pd.Series.tolist)
Now, let’s create a DataFrame with a few rows and columns and execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 24000 30day 1000
1 PySpark 25000 40days 2300
2 Hadoop 25000 35days 1500
3 Python 24000 40days 1200
4 pandas 24000 60days 2500
5 PySpark 25000 50days 2100
6 Python 25000 55days 2000
7 pandas 24000 35days 2500
Pandas DataFrame.groupby() To Group Rows into List
By using DataFrame.gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
# Group Rows on 'Courses' column and get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)
print(df2)
Note that here df.groupby('Courses')['Fee']
returns a Series object. and we have applied apply(list)
on Series object to get you the right result. This example yields the below output.
Courses
Hadoop [25000]
PySpark [25000, 25000]
Python [24000, 25000]
Spark [24000]
pandas [24000, 24000]
Name: Fee, dtype: object
Assign Column Name to Gropby List Result
On groupby()
list results use .reset_index(name="Course_Fee")
to assign a column name to the list column.
# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")
print(df2)
Yields below output.
Courses Course_Fee
0 Hadoop [25000]
1 PySpark [25000, 25000]
2 Python [24000, 25000]
3 Spark [24000]
4 pandas [24000, 24000]
Group Rows into List Using agg() & Lambda Function
Alternatively, you can also do group rows into list using df.groupby("Courses").agg({"Discount":lambda x:list(x)})
function. Use the groupby()
method on the Courses
and agg()
method to apply the aggregation on every group of pandas.DataFrame.
# Group Rows into list
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df2)
Yields below output.
Discount
Courses
Hadoop [1500]
PySpark [2300, 2100]
Python [1200, 2000]
Spark [1000]
pandas [2500, 2500]
Pandas Group Rows into List on All Columns
Let’s see how to group rows into the list for all DataFrame columns. This results in multiple List columns for every group.
# Group Rows into List on All columns
df2 = df.groupby("Courses").agg(list)
print(df2)
Yields below output.
Fee Duration Discount
Courses
Hadoop [25000] [35days] [1500]
PySpark [25000, 25000] [40days, 50days] [2300, 2100]
Python [24000, 25000] [40days, 55days] [1200, 2000]
Spark [24000] [30day] [1000]
pandas [24000, 24000] [60days, 35days] [2500, 2500]
You can also get the same results using
# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df2 = df.groupby('Courses').agg(pd.Series.tolist)
print(df2)
Complete Example For Reference
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
})
df = pd.DataFrame(technologies)
print(df)
# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list)
print(df)
# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Courses Fee")
print(df)
# Using lambda function on the DataFrame
df = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df)
# Using the list as an argument on the DataFrame
df = df.groupby("Courses").agg(list)
print(df)
# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df = df.groupby('Courses').agg(pd.Series.tolist)
print(df)
FAQ on Pandas Group Rows into List Using groupby()
Grouping rows into a list is useful when you want to consolidate data from multiple rows into a single row for each group, making it easier to analyze or process grouped data in a compact format.
To group rows into a list in pandas, you can use the .groupby()
method along with .agg(list)
.
You can group multiple columns into lists in pandas! Use the .agg(list)
function for each column you want to aggregate into a list.
You can customize the aggregation when using pandas groupby()
. Instead of aggregating into lists, you can apply custom functions or predefined aggregations like sum
, mean
, max
, or any other operation.
To group rows into lists in a specific order using pandas, you can use the groupby()
method followed by the apply(list)
function. You can also use sort_values()
before grouping to ensure the desired order within each group.
Conclusion
In this article, you have learned how to group DataFrame rows into the list in the Pandas by using groupby()
and using Series.apply()
, Series.agg()
. Also, you have learned to group rows into a list on all columns.
Happy Learning !!
Related Articles
- Pandas Merge Multiple DataFrames
- Pandas Add Column to DataFrame
- Pandas GroupBy Multiple Columns Explained
- How to Create Pandas Pivot Multiple Columns
- How to Pandas groupby() and sum() With Examples
- Drop Multiple Columns From Pandas DataFrame
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas apply() Function to Single & Multiple Column(s)
- How to Combine Two Columns of Text in Pandas DataFrame