In Polars, the DataFrame.explode()
method is used to transform columns containing lists or arrays into separate rows. Each element in the list or array is placed in its own row, with the values in the other columns being repeated accordingly. This operation flattens the data, creating a long format DataFrame where each item from the list (or array) becomes an individual row.
In this article, I will explain the Polars DataFrame explode()
method by using its syntax, parameters, and usage, and how to return a new DataFrame where the specified columns are exploded into separate rows.
Key Points –
- The
explode()
method is used to flatten columns containing lists or arrays into individual rows, duplicating other columns as necessary. - You can explode a single column that contains lists or arrays, resulting in multiple rows for each element in the list.
- You can explode multiple columns at the same time, ensuring that the relationships between the elements of the exploded columns are preserved.
- You can pass expressions (
Expr
) as columns, allowing you to apply transformations or calculations before exploding the data. explode()
is highly efficient in Polars due to its optimized execution engine, especially on larger datasets.- It returns a new
DataFrame
with the exploded columns, leaving the original DataFrame unchanged.
Polars DataFrame explode() Introduction
Let’s know the syntax of the Polars DataFrame explode() method.
# Syntax of explode()
DataFrame.explode(
columns: str | Expr | Sequence[str | Expr],
*more_columns: str | Expr
) → DataFrame
Parameters of the Polars DataFrame.explode()
Following are the parameters of the polars DataFrame.explode() method.
columns
(str | Expr | Sequence[str | Expr]
) – The column or columns that you want to explode. You can pass:- A single column name as a string (
str
). - A single expression (
Expr
). - A sequence (list or tuple) of column names or expressions (
Sequence[str | Expr]
), which allows you to explode multiple columns at once.
- A single column name as a string (
more_columns
(str | Expr
) – Additional columns to be exploded, passed as individual arguments. This is useful when you need to explode multiple columns.
Return Value
It returns a new DataFrame where the specified columns are exploded, turning lists or arrays into individual rows.
Usage of Polars DataFrame.explode() Method
The Polars DataFrame.explode()
method is a powerful function that transforms list-like column values into individual rows, expanding the DataFrame’s structure. This method is particularly useful when dealing with columns containing lists or arrays, making it easier to analyze.
To run some examples of the Polars DataFrame.explode() method, let’s create a Polars DataFrame.
import polars as pl
# Create a Polars DataFrame with lists in some columns
df = pl.DataFrame({
'ID': [1, 2, 3],
'Courses': [["Spark", "PySpark"], ["Pandas"], ["Hadoop", "C++"]],
'Duration': [['30days', '40days'], ["50days"], ['60days', '45days']]
})
print("Original DataFrame:\n", df)
Yields below output.
To explode a single column in a Polars DataFrame, you can use the explode()
method, specifying the column you want to explode. This will flatten the lists in the specified column, creating multiple rows where each list element gets its own row.
# Exploding the 'Courses' column
df2 = df.explode('Courses')
print("DataFrame after exploding 'Courses' column:\n", df2)
Here,
- The
explode('Courses')
method flattens theCourses
column, creating one row for each item in the list for each original row. - The
Duration
column still contains the same lists of durations, corresponding to each exploded course.
Exploding Multiple Columns
Alternatively, to explode multiple columns in a Polars DataFrame, you can use the explode()
method and pass the names of the columns you want to explode as arguments. This will flatten the lists in each specified column, creating multiple rows where each list element gets its own row, while maintaining the relationship between the columns.
# Exploding both 'Courses' and 'Duration' columns
df2 = df.explode('Courses', 'Duration')
print("DataFrame after exploding 'Courses' and 'Duration' columns:\n", df2)
# Output:
# DataFrame after exploding 'Courses' and 'Duration' columns:
# shape: (5, 3)
┌─────┬─────────┬──────────┐
│ ID ┆ Courses ┆ Duration │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════════╪══════════╡
│ 1 ┆ Spark ┆ 30days │
│ 1 ┆ PySpark ┆ 40days │
│ 2 ┆ Pandas ┆ 50days │
│ 3 ┆ Hadoop ┆ 60days │
│ 3 ┆ C++ ┆ 45days │
└─────┴─────────┴──────────┘
Here,
- The
explode('Courses', 'Duration')
method flattens both theCourses
andDuration
columns. - The relationship between the exploded columns is maintained, so the first course (
"Spark"
) corresponds to the first duration ("30days"
), and so on.
Exploding a Column with Null Values
When exploding a column containing null
values in a Polars DataFrame, the explode()
method handles the null
gracefully. For such cases, Polars treats the null
value as an empty list and doesn’t create new rows for it.
import polars as pl
# Create a Polars DataFrame with lists and null values
df = pl.DataFrame({
'ID': [1, 2, 3, 4],
'Courses': [["Spark", "PySpark"], None, ["Hadoop", "C++"], ["Pandas"]],
'Duration': [['30days', '40days'], None, ['60days', '45days'], ["50days"]]
})
# Explode the 'Courses' column
df2 = df.explode('Courses')
print("DataFrame After Exploding 'Courses':\n", df2)
# Output:
# DataFrame After Exploding 'Courses':
# shape: (6, 3)
┌─────┬─────────┬──────────────────────┐
│ ID ┆ Courses ┆ Duration │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ list[str] │
╞═════╪═════════╪══════════════════════╡
│ 1 ┆ Spark ┆ ["30days", "40days"] │
│ 1 ┆ PySpark ┆ ["30days", "40days"] │
│ 2 ┆ null ┆ null │
│ 3 ┆ Hadoop ┆ ["60days", "45days"] │
│ 3 ┆ C++ ┆ ["60days", "45days"] │
│ 4 ┆ Pandas ┆ ["50days"] │
└─────┴─────────┴──────────────────────┘
Here,
- Rows with
null
in theCourses
column are retained in the output. Thenull
value is treated as an empty list, so no new rows are created for it. - Non-exploded columns retain their original values, including lists or nulls.
Exploding a Column with Empty Lists
Similarly, when exploding a column containing empty lists in Polars, the explode()
method will not create new rows for the empty lists. Rows with empty lists are retained in the DataFrame, but they won’t generate additional rows.
import polars as pl
# Create a Polars DataFrame with lists and empty lists
df = pl.DataFrame({
'ID': [1, 2, 3, 4],
'Courses': [["Spark", "PySpark"], [], ["Hadoop", "C++"], ["Pandas"]],
'Duration': [['30days', '40days'], [], ['60days', '45days'], ["50days"]]
})
# Explode the 'Courses' column
df2 = df.explode('Courses')
print("DataFrame After Exploding 'Courses':\n", df2)
# Output:
# DataFrame After Exploding 'Courses':
# shape: (6, 3)
┌─────┬─────────┬──────────────────────┐
│ ID ┆ Courses ┆ Duration │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ list[str] │
╞═════╪═════════╪══════════════════════╡
│ 1 ┆ Spark ┆ ["30days", "40days"] │
│ 1 ┆ PySpark ┆ ["30days", "40days"] │
│ 2 ┆ null ┆ [] │
│ 3 ┆ Hadoop ┆ ["60days", "45days"] │
│ 3 ┆ C++ ┆ ["60days", "45days"] │
│ 4 ┆ Pandas ┆ ["50days"] │
└─────┴─────────┴──────────────────────┘
Here,
- Rows with empty lists retain their
ID
values, but the exploded column (Courses
) will shownull
for those rows. This behavior ensures the integrity of the DataFrame, maintaining row alignment. - Other columns (
Duration
in this case) are not exploded unless explicitly mentioned.
Conclusion
In this article, I have explained the Polars DataFrame explode()
method by using syntax, parameters, and usage. This method transforms list-like column values into individual rows, flattening the specified column while retaining the structure of the other columns.
Happy Learning!!