Pandas DataFrame explode() Method

In Pandas, the explode() method is used to transform each element of a list-like column into a separate row, replicating the index values for other columns. This can be particularly useful when you have a DataFrame with a column containing lists or arrays and you want to expand these lists into individual rows.

Syntax of Pandas DataFrame explode() Function

Following is the syntax of the Pandas DataFrame explode()


# Syntax of Pandas DataFrame explode()
DataFrame.explode(column, ignore_index=False)

Parameters of the DataFrame explode()

Following are the parameters of the DataFrame explode() function.

column – str or tuple. The column to explode. If a tuple, it is interpreted as the column name and level within a MultiIndex.
ignore_index – bool, default False. If True, the resulting DataFrame will have a continuous integer index, starting from 0.

Return Value

It returns a DataFrame where the specified column’s list elements are expanded into separate rows.

Usage of Pandas DataFrame explode() Method

The explode() method in Pandas is highly useful when you need to transform each element of a list-like column into a separate row. This method is commonly used in data preprocessing and cleaning, especially when dealing with nested data structures or lists within DataFrame cells.

To run some examples of the Pandas DataFrame explode() method, let’s create a Pandas DataFrame using data from a dictionary, with columns A, B, and C.


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = (
    {'A': [["Spark","PySpark","Pandas"], 'Course', [], ["Java","Python"]],
     'B': [25000,15000,30000,20000],
     'C': [['30days','40days','35days'], np.nan, [], ['40days','55days']]})
df = pd.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

You can use the DataFrame.explode() function to transform each element of the specified single column A into a separate row, with each value in the list becoming its own row. This process converts every element in the list of column A into individual rows. If the array-like column is empty, the empty lists will be expanded into NaN values.


# Explode the list-like column 'A'
df_exploded = df.explode('A')
print("DataFrame after exploding column 'A':\n", df_exploded)

Yields below output.

Explode Multiple List-Like Columns

Alternatively, to demonstrate exploding multiple list-like columns in a Pandas DataFrame, you can chain the explode() method for each column you want to expand.


# Using explode to expand multiple columns
df_exploded = df.explode('A').explode('B').explode('C')
print("DataFrame after exploding columns 'A', 'B', and 'C':\n", df_exploded)

In the above examples, the explode() method is sequentially applied to columns A, B, and C, expanding each element of the lists into separate rows. This results in a DataFrame where each row corresponds to a combination of elements from columns A, B, and C, maintaining the integrity of other columns’ data. This example yields the below output.


# Output:
DataFrame after exploding columns 'A', 'B', and 'C':
          A      B       C
0    Spark  25000  30days
0    Spark  25000  40days
0    Spark  25000  35days
0  PySpark  25000  30days
0  PySpark  25000  40days
0  PySpark  25000  35days
0   Pandas  25000  30days
0   Pandas  25000  40days
0   Pandas  25000  35days
1   Course  15000     NaN
2      NaN  30000     NaN
3     Java  20000  40days
3     Java  20000  55days
3   Python  20000  40days
3   Python  20000  55days

Explode with NaN Values

When using the explode() method in Pandas with columns containing NaN values, it handles them by preserving the NaN values in the resulting DataFrame.


# Explode column 'C'
df_exploded = df.explode('C')
print("DataFrame after exploding column 'C'\n", df_exploded)

In the above examples, the explode() method expands each list element in column C into separate rows. NaN values are preserved in the resulting DataFrame, and rows associated with NaN in C are excluded from the exploded DataFrame. This example yields the below output.


# Output:
DataFrame after exploding column 'C'
                           A      B       C
0  [Spark, PySpark, Pandas]  25000  30days
0  [Spark, PySpark, Pandas]  25000  40days
0  [Spark, PySpark, Pandas]  25000  35days
1                    Course  15000     NaN
2                        []  30000     NaN
3            [Java, Python]  20000  40days
3            [Java, Python]  20000  55days

Exploding a Column with Tuples

Similarly, exploding a column with tuples functions in the same way as exploding a column with lists.


import pandas as pd

# Create DataFrame with tuples
data = {
    'A': [(10, 20), (30, 40), (50, 60)],
    'B': ["Spark","PySpark","Pandas"]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

# Explode column 'A'
df_exploded = df.explode('A')
print("DataFrame after exploding column 'A':\n", df_exploded)

In the above examples, the explode() method is applied to column A, which splits each tuple into separate rows. The corresponding values in column B are replicated for each new row created from the tuples in A. This example yields the below output.


# Output:
Original DataFrame:
           A        B
0  (10, 20)    Spark
1  (30, 40)  PySpark
2  (50, 60)   Pandas
DataFrame after exploding column 'A':
     A        B
0  10    Spark
0  20    Spark
1  30  PySpark
1  40  PySpark
2  50   Pandas
2  60   Pandas

Exploding Nested Lists

Finally, exploding nested lists requires applying the explode() method multiple times to fully expand all levels of the nested lists.


import pandas as pd

# Create DataFrame with nested lists
data = {
    'A': [[["Spark","PySpark","Pandas"]], [['Hadoop'], ['R Programming'], ['Hyperion']], [['C++'], ["Java","Python"]]],
    'B': [10, 20, 30]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

# Explode nested lists in column 'A'
df_exploded = df.explode('A').explode('A')
print("DataFrame after exploding nested lists in column 'A':\n", df_exploded)

In the above examples, The explode() method is applied twice to column A, first to expand the outer lists and then to expand the inner lists. This results in a DataFrame where each nested element in column A is expanded into separate rows, while preserving the corresponding values in column B. This example yields the below output.


Original DataFrame:
                                          A   B
0               [[Spark, PySpark, Pandas]]  10
1  [[Hadoop], [R Programming], [Hyperion]]  20
2                  [[C++], [Java, Python]]  30
DataFrame after exploding nested lists in column 'A':
                A   B
0          Spark  10
0        PySpark  10
0         Pandas  10
1         Hadoop  20
1  R Programming  20
1       Hyperion  20
2            C++  30
2           Java  30
2         Python  30

Frequently Asked Questions on Pandas DataFrame explode() Method

What does the explode() method do in a Pandas DataFrame?

The explode() method transforms each element of a list-like column into a separate row, expanding the DataFrame while preserving the index values for other columns.

Can I use explode() on multiple columns at once?

The explode() method can only be applied to one column at a time. However, you can chain the explode() method to sequentially explode multiple columns.

How does explode() handle NaN or None values?

The explode() method preserves NaN or None values in the resulting DataFrame. If the column to be exploded contains NaN or None, those entries will remain unchanged and will not be expanded.

Can I use explode() on columns containing tuples?

The explode() method works with list-like data structures, including lists, tuples, and arrays within a DataFrame column.

How can I handle nested lists with explode()?

To handle nested lists, apply the explode() method multiple times to fully expand all levels of the nested lists.

Conclusion

In this article, I have explained the Pandas DataFrame explode() function by using syntax, parameters, and usage. This method allows you to transform each element of a specified list-like column into a separate row, retaining and replicating the index values for the new rows. If multiple list-like columns are specified, each will be explored in turn.

Happy Learning!!

Reference

https://pandas.pydata.org/docs/reference/api/pandas.Series.explode.html