• Post author:
  • Post category:PySpark
  • Post last modified:October 16, 2025
  • Reading time:14 mins read
You are currently viewing Explain PySpark explode_outer() Function

In PySpark, the explode_outer() function is used to explode array or map columns into multiple rows, just like the explode() function, but with one key difference: it retains rows even when arrays or maps are null or empty.

Advertisements

It is part of the pyspark.sql.functions module and is particularly useful when working with nested structures such as arrays, maps, JSON, or structs where you don’t want to lose records that have null or empty values.

In this article, we’ll explore how explode_outer() works, understand its behavior with null and empty arrays, and cover use cases such as exploding arrays, maps, JSON data, and comparing it with the regular explode() function.

Key Points-

  • explode_outer() expands array or map columns into multiple rows.
  • Unlike explode(), it keeps rows with null or empty arrays/maps.
  • When applied to arrays, each element becomes a separate row.
  • When applied to maps, each key-value pair becomes a separate row.
  • Ideal for retaining all records, even when data is missing or incomplete.
  • Can be used on multiple columns simultaneously for flattening nested data.
  • Widely used when parsing nested JSON or semi-structured data.

PySpark explode() Function

The explode_outer() function generates a new row for each element in an array or each key-value pair in a map column, similar to explode(). However, unlike explode(), it retains records where the column is null or empty, returning rows with null values instead of omitting them.

Syntax

Following is the syntax of the explode_outer() function.


# Syntax of the explode_outer()
from pyspark.sql.functions import explode_outer
explode_outer(col)

Parameters

  • col: The column name or expression containing an array or map to be exploded.

Return Value

Returns a column that produces multiple rows, one for each array element or map entry. If the input column is null, it returns a single row with null values instead of removing the record.

Let’s start with a sample DataFrame containing arrays and maps.


# Create SparkSession and Prepare sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col

spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()

arrayData = [
    ('James', ['Java', 'Scala'], {'hair': 'black', 'eye': 'brown'}),
    ('Michael', ['Spark', 'Java', None], {'hair': 'brown', 'eye': None}),
    ('Robert', ['CSharp', ''], {'hair': 'red', 'eye': ''}),
    ('Washington', None, None),
    ('Jefferson', ['1', '2'], {})
]

df = spark.createDataFrame(data=arrayData, schema=['name', 'knownLanguages', 'properties'])
df.printSchema()
df.show(truncate=False)

Yields below the output.

PySaprk explode_outer

PySpark explode_outer() on Array Column

You can use explode_outer() on an array-type column to expand each element into a separate row, while preserving records that contain null or empty arrays.


# explode_outer() on array column
df_outer_array = df.select(df.name, explode_outer(df.knownLanguages).alias("language"))
df_outer_array.show(truncate=False)

Yields below the output.

PySaprk explode_outer

Unlike explode(), the row with Washington is not dropped even though his knownLanguages column is null.

explode_outer() on Map Column

When applied to a map column, explode_outer() converts each key-value pair into a separate row.
If the map is null or empty, it returns a single row with nulls for the key and value columns.


# explode_outer() on map column
df_outer_map = df.select(df.name, explode_outer(df.properties))
df_outer_map.show(truncate=False)

Yields below the output.


# Output:
+----------+----+-----+
|name      |key |value|
+----------+----+-----+
|James     |eye |brown|
|James     |hair|black|
|Michael   |eye |NULL |
|Michael   |hair|brown|
|Robert    |eye |     |
|Robert    |hair|red  |
|Washington|NULL|NULL |
|Jefferson |NULL|NULL |
+----------+----+-----+

Even for Washington and Jefferson, the function keeps their rows with NULL values.

explode_outer() on JSON Column

You can use from_json() and explode_outer() together to flatten nested JSON data that includes arrays.
This helps extract array elements while keeping null records intact.


# PySpark explode() JSON Column
from pyspark.sql.functions import from_json, schema_of_json

json_schema = schema_of_json('{"lang":["Python","Java"],"level":"Intermediate"}')

data = [("James", '{"lang":["Python","Java"],"level":"Intermediate"}')]
df_json = spark.createDataFrame(data, ["name", "json_data"])

df_parsed = df_json.withColumn("parsed", from_json(col("json_data"), json_schema))
df_exploded_json = df_parsed.select("name", explode(col("parsed.lang")).alias("language"))
df_exploded_json.show(truncate=False)

Yields below the output.


# Output:
+-------+--------+
|name   |language|
+-------+--------+
|James  |Python  |
|James  |Java    |
|Michael|NULL    |
+-------+--------+

Even though Michael’s json_data is null, the row is preserved with a null value for language.

explode() vs explode_outer()

The table below shows how explode() and explode_outer() handle columns that contain null values.

FunctionDescription
explode()Skips rows where array/map is null or empty.
explode_outer()Includes rows even when array/map is null or empty (returns null values).

Example:

If you want to handle null values gracefully, use explode_outer(), which ensures that even when the column is null, it returns a row with a null value in the exploded column instead of dropping the record. In contrast, the explode() function skips such records and returns no rows for null columns.


Difference between explode() and explode_outer()
from pyspark.sql.functions import explode

df_explode = df.select(df.name, explode(df.knownLanguages).alias("language"))
df_explode_outer = df.select(df.name, explode_outer(df.knownLanguages).alias("language"))

print("explode() result:")
df_explode.show()

print("explode_outer() result:")
df_explode_outer.show()

Yields below the output.


# Output:
explode() result:
+---------+--------+
|     name|language|
+---------+--------+
|    James|    Java|
|    James|   Scala|
|  Michael|   Spark|
|  Michael|    Java|
|  Michael|    NULL|
|   Robert|  CSharp|
|   Robert|        |
|Jefferson|       1|
|Jefferson|       2|
+---------+--------+

explode_outer() result:
+----------+--------+
|      name|language|
+----------+--------+
|     James|    Java|
|     James|   Scala|
|   Michael|   Spark|
|   Michael|    Java|
|   Michael|    NULL|
|    Robert|  CSharp|
|    Robert|        |
|Washington|    NULL|
| Jefferson|       1|
| Jefferson|       2|
+----------+--------+

Frequently Asked Questions of PySpark explode_outer()

What does explode_outer() do in PySpark?

It converts array or map columns into multiple rows and retains records even when the column is null or empty.

How is it different from explode()?

explode() skips null or empty arrays, while explode_outer() includes them.

When should I use explode_outer()?

When you want to preserve all records, even if some columns contain null or missing data.

How can I use explode_outer() on multiple columns?

You can apply multiple explode_outer() functions within a single select() statement to expand multiple columns at once.

How does explode_outer() work with JSON data?

It works perfectly when used with from_json() to flatten arrays inside JSON structures.

Conclusion

In this article, you learned how to use PySpark explode_outer() to flatten arrays and maps into multiple rows while retaining null or empty records.
We covered how it behaves with arrays, maps, and JSON data and compared it with the regular explode() function.

The explode_outer() function is especially useful when working with incomplete, hierarchical, or semi-structured data, where you want to preserve every record during transformations.

Happy Learning!!

Reference

Related Articles