In PySpark, the explode_outer() function is used to explode array or map columns into multiple rows, just like the explode() function, but with one key difference: it retains rows even when arrays or maps are null or empty.
It is part of the pyspark.sql.functions module and is particularly useful when working with nested structures such as arrays, maps, JSON, or structs where you don’t want to lose records that have null or empty values.
In this article, we’ll explore how explode_outer() works, understand its behavior with null and empty arrays, and cover use cases such as exploding arrays, maps, JSON data, and comparing it with the regular explode() function.
Key Points-
explode_outer()expands array or map columns into multiple rows.- Unlike
explode(), it keeps rows with null or empty arrays/maps. - When applied to arrays, each element becomes a separate row.
- When applied to maps, each key-value pair becomes a separate row.
- Ideal for retaining all records, even when data is missing or incomplete.
- Can be used on multiple columns simultaneously for flattening nested data.
- Widely used when parsing nested JSON or semi-structured data.
PySpark explode() Function
The explode_outer() function generates a new row for each element in an array or each key-value pair in a map column, similar to explode(). However, unlike explode(), it retains records where the column is null or empty, returning rows with null values instead of omitting them.
Syntax
Following is the syntax of the explode_outer() function.
# Syntax of the explode_outer()
from pyspark.sql.functions import explode_outer
explode_outer(col)
Parameters
col:The column name or expression containing an array or map to be exploded.
Return Value
Returns a column that produces multiple rows, one for each array element or map entry. If the input column is null, it returns a single row with null values instead of removing the record.
Let’s start with a sample DataFrame containing arrays and maps.
# Create SparkSession and Prepare sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col
spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()
arrayData = [
('James', ['Java', 'Scala'], {'hair': 'black', 'eye': 'brown'}),
('Michael', ['Spark', 'Java', None], {'hair': 'brown', 'eye': None}),
('Robert', ['CSharp', ''], {'hair': 'red', 'eye': ''}),
('Washington', None, None),
('Jefferson', ['1', '2'], {})
]
df = spark.createDataFrame(data=arrayData, schema=['name', 'knownLanguages', 'properties'])
df.printSchema()
df.show(truncate=False)
Yields below the output.

PySpark explode_outer() on Array Column
You can use explode_outer() on an array-type column to expand each element into a separate row, while preserving records that contain null or empty arrays.
# explode_outer() on array column
df_outer_array = df.select(df.name, explode_outer(df.knownLanguages).alias("language"))
df_outer_array.show(truncate=False)
Yields below the output.
Unlike explode(), the row with Washington is not dropped even though his knownLanguages column is null.
explode_outer() on Map Column
When applied to a map column, explode_outer() converts each key-value pair into a separate row.
If the map is null or empty, it returns a single row with nulls for the key and value columns.
# explode_outer() on map column
df_outer_map = df.select(df.name, explode_outer(df.properties))
df_outer_map.show(truncate=False)
Yields below the output.
# Output:
+----------+----+-----+
|name |key |value|
+----------+----+-----+
|James |eye |brown|
|James |hair|black|
|Michael |eye |NULL |
|Michael |hair|brown|
|Robert |eye | |
|Robert |hair|red |
|Washington|NULL|NULL |
|Jefferson |NULL|NULL |
+----------+----+-----+
Even for Washington and Jefferson, the function keeps their rows with NULL values.
explode_outer() on JSON Column
You can use from_json() and explode_outer() together to flatten nested JSON data that includes arrays.
This helps extract array elements while keeping null records intact.
# PySpark explode() JSON Column
from pyspark.sql.functions import from_json, schema_of_json
json_schema = schema_of_json('{"lang":["Python","Java"],"level":"Intermediate"}')
data = [("James", '{"lang":["Python","Java"],"level":"Intermediate"}')]
df_json = spark.createDataFrame(data, ["name", "json_data"])
df_parsed = df_json.withColumn("parsed", from_json(col("json_data"), json_schema))
df_exploded_json = df_parsed.select("name", explode(col("parsed.lang")).alias("language"))
df_exploded_json.show(truncate=False)
Yields below the output.
# Output:
+-------+--------+
|name |language|
+-------+--------+
|James |Python |
|James |Java |
|Michael|NULL |
+-------+--------+
Even though Michael’s json_data is null, the row is preserved with a null value for language.
explode() vs explode_outer()
The table below shows how explode() and explode_outer() handle columns that contain null values.
| Function | Description |
|---|---|
| explode() | Skips rows where array/map is null or empty. |
| explode_outer() | Includes rows even when array/map is null or empty (returns null values). |
Example:
If you want to handle null values gracefully, use explode_outer(), which ensures that even when the column is null, it returns a row with a null value in the exploded column instead of dropping the record. In contrast, the explode() function skips such records and returns no rows for null columns.
Difference between explode() and explode_outer()
from pyspark.sql.functions import explode
df_explode = df.select(df.name, explode(df.knownLanguages).alias("language"))
df_explode_outer = df.select(df.name, explode_outer(df.knownLanguages).alias("language"))
print("explode() result:")
df_explode.show()
print("explode_outer() result:")
df_explode_outer.show()
Yields below the output.
# Output:
explode() result:
+---------+--------+
| name|language|
+---------+--------+
| James| Java|
| James| Scala|
| Michael| Spark|
| Michael| Java|
| Michael| NULL|
| Robert| CSharp|
| Robert| |
|Jefferson| 1|
|Jefferson| 2|
+---------+--------+
explode_outer() result:
+----------+--------+
| name|language|
+----------+--------+
| James| Java|
| James| Scala|
| Michael| Spark|
| Michael| Java|
| Michael| NULL|
| Robert| CSharp|
| Robert| |
|Washington| NULL|
| Jefferson| 1|
| Jefferson| 2|
+----------+--------+
Frequently Asked Questions of PySpark explode_outer()
It converts array or map columns into multiple rows and retains records even when the column is null or empty.
explode() skips null or empty arrays, while explode_outer() includes them.
When you want to preserve all records, even if some columns contain null or missing data.
You can apply multiple explode_outer() functions within a single select() statement to expand multiple columns at once.
It works perfectly when used with from_json() to flatten arrays inside JSON structures.
Conclusion
In this article, you learned how to use PySpark explode_outer() to flatten arrays and maps into multiple rows while retaining null or empty records.
We covered how it behaves with arrays, maps, and JSON data and compared it with the regular explode() function.
The explode_outer() function is especially useful when working with incomplete, hierarchical, or semi-structured data, where you want to preserve every record during transformations.
Happy Learning!!
Reference
Related Articles
- PySpark – explode nested array into rows
- PySpark MapType (Dict) Usage with Examples
- PySpark Convert Dictionary/Map to Multiple Columns
- PySpark ArrayType Column With Examples
- PySpark map() Transformation
- PySpark array_contains() function with examples.
- Explain PySpark element_at() with Examples
- Iterate over Elements of Array in PySpark DataFrame
- Explain the posexplode() function with examples
- Explain the posexplode_outer() function with examples