You can group DataFrame rows into a list by using pandas.DataFrame.groupby()
function on the column of interest, select the column you want as a list from group and then use Series.apply(list)
to get the list for every group. In this article, I will explain how to group rows into the list using few examples.
1. Quick Examples
Below are some of the good examples to group rows into a list in pandas DataFrame.
# Group Rows on 'Courses' column and get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)
# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")
# Group Rows into List
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
# Group Rows into List on All columns
df2 = df.groupby("Courses").agg(list)
# Other way
df2 = df.groupby('Courses').agg(pd.Series.tolist)
Now, let’s create a DataFrame with a few rows and columns and execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 24000 30day 1000
1 PySpark 25000 40days 2300
2 Hadoop 25000 35days 1500
3 Python 24000 40days 1200
4 pandas 24000 60days 2500
5 PySpark 25000 50days 2100
6 Python 25000 55days 2000
7 pandas 24000 35days 2500
2. Pandas DataFrame.groupby() To Group Rows into List
By using DataFrame.gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
# Group Rows on 'Courses' column and get List for 'Fee' column
df2 = df.groupby('Courses')['Fee'].apply(list)
print(df2)
Yields below output. Note that here df.groupby('Courses')['Fee']
returns a Series object. and we have applied apply(list)
on Series object to get you the right result.
Courses
Hadoop [25000]
PySpark [25000, 25000]
Python [24000, 25000]
Spark [24000]
pandas [24000, 24000]
Name: Fee, dtype: object
3. Assign Column Name to Gropby List result
On groupby() list results use .reset_index(name="Course_Fee")
to assign a column name to the list column.
# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Course_Fee")
print(df2)
Yields below output.
Courses Course_Fee
0 Hadoop [25000]
1 PySpark [25000, 25000]
2 Python [24000, 25000]
3 Spark [24000]
4 pandas [24000, 24000]
4. Group Rows into List Using agg() & Lambda Function.
Alternatively, you can also do group rows into list using df.groupby("Courses").agg({"Discount":lambda x:list(x)})
function. Use the groupby() method on the Courses
and agg()
method to apply the aggregation on every group of pandas.DataFrame.
# Group Rows into List
df2 = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df2)
Yields below output.
Discount
Courses
Hadoop [1500]
PySpark [2300, 2100]
Python [1200, 2000]
Spark [1000]
pandas [2500, 2500]
5. Pandas Group Rows into List on All Columns
Let’s see how to group rows into the list for all DataFrame columns. This results in multiple List columns for every group.
# Group Rows into List on All columns
df2 = df.groupby("Courses").agg(list)
print(df2)
Yields below output.
Fee Duration Discount
Courses
Hadoop [25000] [35days] [1500]
PySpark [25000, 25000] [40days, 50days] [2300, 2100]
Python [24000, 25000] [40days, 55days] [1200, 2000]
Spark [24000] [30day] [1000]
pandas [24000, 24000] [60days, 35days] [2500, 2500]
You can also get the same results using
# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df2 = df.groupby('Courses').agg(pd.Series.tolist)
print(df2)
6. Complete Example For Reference
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000],
'Duration':['30day','40days','35days', '40days','60days','50days','55days','35days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000,2500]
})
df = pd.DataFrame(technologies)
print(df)
# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list)
print(df)
# Use groupby method and apply() method on the DataFrame
df = df.groupby('Courses')['Fee'].apply(list).reset_index(name="Courses Fee")
print(df)
# Using lambda function on the DataFrame
df = df.groupby("Courses").agg({"Discount": lambda x: list(x)})
print(df)
# Using the list as an argument on the DataFrame
df = df.groupby("Courses").agg(list)
print(df)
# Using .agg(pd.Series.tolist) as the argument on the DataFrame
df = df.groupby('Courses').agg(pd.Series.tolist)
print(df)
Conclusion
In this article, you have learned how to group DataFrame rows into the list in the Pandas by using groupby()
and using Series.apply()
, Series.agg()
. Also, you have learned to group rows into a list on all columns.
Happy Learning !!
Related Articles
- How to Pandas groupby() and sum() With Examples
- Pandas apply() Function to Single & Multiple Column(s)
- Drop Multiple Columns From Pandas DataFrame
- How to Combine Two Columns of Text in Pandas DataFrame
- Pandas GroupBy Multiple Columns Explained
- Apply Multiple Filters to Pandas DataFrame or Series
- How to Create Pandas Pivot Multiple Columns
- Pandas Merge Multiple DataFrames