In Pandas, the explode()
method is used to transform each element of a list-like column into a separate row, replicating the index values for other columns. This can be particularly useful when you have a DataFrame with a column containing lists or arrays and you want to expand these lists into individual rows.
In this article, I will explain the Pandas DataFrame explode()
method by using its syntax, parameters, and usage, and how to return a DataFrame with the list elements of the specified column expanded into separate rows.
Key Points –
- The
explode()
method is used to transform each element of a list-like column into a separate row, replicating the index values. - It can be applied to a single column of a DataFrame that contains list-like elements.
- Rows where the specified column contains an empty list will result in rows with
NaN
in the exploded output. - This method returns a DataFrame with the exploded values, where each list element from the original column is placed in its own row.
- Multiple columns can be exploded by chaining the
explode
method, allowing for sequential expansion of different list-like columns in the DataFrame.
Syntax of Pandas DataFrame explode() Function
Following is the syntax of the Pandas DataFrame explode()
# Syntax of Pandas DataFrame explode()
DataFrame.explode(column, ignore_index=False)
Parameters of the DataFrame explode()
Following are the parameters of the DataFrame explode() function.
column
– str or tuple. The column to explode. If a tuple, it is interpreted as the column name and level within a MultiIndex.ignore_index
– bool, default False. If True, the resulting DataFrame will have a continuous integer index, starting from 0.
Return Value
It returns a DataFrame where the specified column’s list elements are expanded into separate rows.
Usage of Pandas DataFrame explode() Method
The explode()
method in Pandas is highly useful when you need to transform each element of a list-like column into a separate row. This method is commonly used in data preprocessing and cleaning, especially when dealing with nested data structures or lists within DataFrame cells.
To run some examples of the Pandas DataFrame explode() method, let’s create a Pandas DataFrame using data from a dictionary, with columns A
, B
, and C
.
# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = (
{'A': [["Spark","PySpark","Pandas"], 'Course', [], ["Java","Python"]],
'B': [25000,15000,30000,20000],
'C': [['30days','40days','35days'], np.nan, [], ['40days','55days']]})
df = pd.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
You can use the DataFrame.explode()
function to transform each element of the specified single column A
into a separate row, with each value in the list becoming its own row. This process converts every element in the list of column A
into individual rows. If the array-like column is empty, the empty lists will be expanded into NaN
values.
# Explode the list-like column 'A'
df_exploded = df.explode('A')
print("DataFrame after exploding column 'A':\n", df_exploded)
Yields below output.
Explode Multiple List-Like Columns
Alternatively, to demonstrate exploding multiple list-like columns in a Pandas DataFrame, you can chain the explode()
method for each column you want to expand.
# Using explode to expand multiple columns
df_exploded = df.explode('A').explode('B').explode('C')
print("DataFrame after exploding columns 'A', 'B', and 'C':\n", df_exploded)
In the above examples, the explode()
method is sequentially applied to columns A
, B
, and C
, expanding each element of the lists into separate rows. This results in a DataFrame where each row corresponds to a combination of elements from columns A
, B
, and C
, maintaining the integrity of other columns’ data. This example yields the below output.
# Output:
DataFrame after exploding columns 'A', 'B', and 'C':
A B C
0 Spark 25000 30days
0 Spark 25000 40days
0 Spark 25000 35days
0 PySpark 25000 30days
0 PySpark 25000 40days
0 PySpark 25000 35days
0 Pandas 25000 30days
0 Pandas 25000 40days
0 Pandas 25000 35days
1 Course 15000 NaN
2 NaN 30000 NaN
3 Java 20000 40days
3 Java 20000 55days
3 Python 20000 40days
3 Python 20000 55days
Explode with NaN Values
When using the explode()
method in Pandas with columns containing NaN values, it handles them by preserving the NaN values in the resulting DataFrame.
# Explode column 'C'
df_exploded = df.explode('C')
print("DataFrame after exploding column 'C'\n", df_exploded)
In the above examples, the explode()
method expands each list element in column C
into separate rows. NaN values are preserved in the resulting DataFrame, and rows associated with NaN in C
are excluded from the exploded DataFrame. This example yields the below output.
# Output:
DataFrame after exploding column 'C'
A B C
0 [Spark, PySpark, Pandas] 25000 30days
0 [Spark, PySpark, Pandas] 25000 40days
0 [Spark, PySpark, Pandas] 25000 35days
1 Course 15000 NaN
2 [] 30000 NaN
3 [Java, Python] 20000 40days
3 [Java, Python] 20000 55days
Exploding a Column with Tuples
Similarly, exploding a column with tuples functions in the same way as exploding a column with lists.
import pandas as pd
# Create DataFrame with tuples
data = {
'A': [(10, 20), (30, 40), (50, 60)],
'B': ["Spark","PySpark","Pandas"]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)
# Explode column 'A'
df_exploded = df.explode('A')
print("DataFrame after exploding column 'A':\n", df_exploded)
In the above examples, the explode()
method is applied to column A
, which splits each tuple into separate rows. The corresponding values in column B
are replicated for each new row created from the tuples in A
. This example yields the below output.
# Output:
Original DataFrame:
A B
0 (10, 20) Spark
1 (30, 40) PySpark
2 (50, 60) Pandas
DataFrame after exploding column 'A':
A B
0 10 Spark
0 20 Spark
1 30 PySpark
1 40 PySpark
2 50 Pandas
2 60 Pandas
Exploding Nested Lists
Finally, exploding nested lists requires applying the explode()
method multiple times to fully expand all levels of the nested lists.
import pandas as pd
# Create DataFrame with nested lists
data = {
'A': [[["Spark","PySpark","Pandas"]], [['Hadoop'], ['R Programming'], ['Hyperion']], [['C++'], ["Java","Python"]]],
'B': [10, 20, 30]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)
# Explode nested lists in column 'A'
df_exploded = df.explode('A').explode('A')
print("DataFrame after exploding nested lists in column 'A':\n", df_exploded)
In the above examples, The explode()
method is applied twice to column A
, first to expand the outer lists and then to expand the inner lists. This results in a DataFrame where each nested element in column A
is expanded into separate rows, while preserving the corresponding values in column B
. This example yields the below output.
Original DataFrame:
A B
0 [[Spark, PySpark, Pandas]] 10
1 [[Hadoop], [R Programming], [Hyperion]] 20
2 [[C++], [Java, Python]] 30
DataFrame after exploding nested lists in column 'A':
A B
0 Spark 10
0 PySpark 10
0 Pandas 10
1 Hadoop 20
1 R Programming 20
1 Hyperion 20
2 C++ 30
2 Java 30
2 Python 30
Frequently Asked Questions on Pandas DataFrame explode() Method
The explode()
method transforms each element of a list-like column into a separate row, expanding the DataFrame while preserving the index values for other columns.
The explode()
method can only be applied to one column at a time. However, you can chain the explode()
method to sequentially explode multiple columns.
The explode()
method preserves NaN or None values in the resulting DataFrame. If the column to be exploded contains NaN or None, those entries will remain unchanged and will not be expanded.
The explode()
method works with list-like data structures, including lists, tuples, and arrays within a DataFrame column.
To handle nested lists, apply the explode()
method multiple times to fully expand all levels of the nested lists.
Conclusion
In this article, I have explained the Pandas DataFrame explode()
function by using syntax, parameters, and usage. This method allows you to transform each element of a specified list-like column into a separate row, retaining and replicating the index values for the new rows. If multiple list-like columns are specified, each will be explored in turn.
Happy Learning!!
Related Articles
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame mode() Method
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method
- Pandas DataFrame equals() Method
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function