Posexplode_outer() in PySpark is a powerful function designed to explode or flatten array or map columns into multiple rows while retaining the position (index) of each element. Unlike posexplode(), which skips rows with null or empty arrays/maps, posexplode_outer() produces rows even when the array or map is null or empty by returning (null, null) for position and element columns.
In this article, I will explain the PySpark posexplode_outer() function, including its syntax, parameters, and practical usage. You’ll learn how to use it to explode array or map columns in a DataFrame into multiple rows while retaining the position (index) of each element. Additionally, I’ll show how it returns null values for rows where the array or map columns are null or empty.
Key Points-
- It returns a new row for each element in an array or each key-value pair in a map column.
- It includes an additional column for the position of each element within the array or map.
- If the array or map column is null or empty, it produces a row with (null, null) values instead of dropping the row.
- By default, the resulting columns are named
posfor the position andcolfor the element in an array; for a map, it producespos,key, andvaluecolumns. - This function is a combination of functionalities of explode_outer() (which retains null/empty array rows) and posexplode() (which adds element positions).
PySpark posexplode_outer() Function
The PySpark posexplode_outer() function operates similarly to posexplode(), generating a new row for each element in an array or map along with its position. The key difference is that posexplode_outer() retains rows where the array or map is null or empty by generating rows with null positions and values, whereas posexplode() would skip such rows.
Syntax
Following is the syntax of the poexplode_outer() function.
# Syntax of the posexplode_outer()
from pyspark.sql.functions import posexplode
posexplode_outer(col)
Parameters
col:The name or expression of the column containing an array or map.
Return Value
- Returns new columns with each row representing an element of an array or a key-value pair from a map, along with its position.
- For null or empty arrays/maps, produces a row with null values.
Let’s start with a sample DataFrame containing arrays and maps.
# Create SparkSession and Prepare sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col
spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()
arrayData = [
('James', ['Java', 'Scala'], {'hair': 'black', 'eye': 'brown'}),
('Michael', ['Spark', 'Java', None], {'hair': 'brown', 'eye': None}),
('Robert', ['CSharp', ''], {'hair': 'red', 'eye': ''}),
('Washington', None, None),
('Jefferson', ['1', '2'], {})
]
df = spark.createDataFrame(data=arrayData, schema=['name', 'knownLanguages', 'properties'])
df.printSchema()
df.show(truncate=False)
Yields below the output.

Using PySpark posexplode_outer() on Array Column
You can apply posexplode_outer() to an array column to create a new row for each element. Unlike posexplode(), posexplode_outer() retains rows where arrays are null or empty by producing rows with null positions and values.
# Using posexplode_outer() on Array Column
from pyspark.sql.functions import posexplode_outer
df_outer = df.select(df.name, posexplode_outer(df.knownLanguages))
df_outer.show(truncate=False)
Yields below the output.

As shown, rows with null or empty arrays (like “Washington”) are retained with (null, null) values.
Using PySpark posexplode_outer() on Map Column
posexplode_outer() on a map column generates rows with position, key, and value. Null or empty maps generate a single row with nulls.
# Posexplode_outer() map column
df_outer_map = df.select(df.name, posexplode_outer(df.properties).alias("pos", "key", "value"))
df_outer_map.show(truncate=False)
Yields below the output.
# Output:
+----------+----+----+-----+
|name |pos |key |value|
+----------+----+----+-----+
|James |0 |eye |brown|
|James |1 |hair|black|
|Michael |0 |eye |NULL |
|Michael |1 |hair|brown|
|Robert |0 |eye | |
|Robert |1 |hair|red |
|Washington|NULL|NULL|NULL |
|Jefferson |NULL|NULL|NULL |
+----------+----+----+-----+
Even for null or empty maps, posexplode_outer() generates a row with nulls.
Compare posexplode_outer() vs posexplode()
The table below highlights the key differences between posexplode()_outer() and posexplode() in PySpark:
| Feature | posexplode() | posexplode_outer() |
|---|---|---|
| Handles null/empty arrays | Skips rows | Retains rows with null outputs |
| Output columns for arrays | pos, col | pos, col |
| Output columns for maps | pos, key, value | pos, key, value |
| Useful when order matters | Yes | Yes |
| Preserves null/empty rows | No | Yes |
Frequently Asked Questions of PySpark posexplode_outer()
posexplode() skips null or empty arrays/maps, while posexplode_outer() includes them with (null, null) placeholders.
Use it when you want to preserve all rows, even if some arrays or maps are missing or empty.
Yes, it supports both arrays return pos and col; maps return pos, key, and value.
It prevents data loss by retaining rows that would otherwise be dropped during explosion.
Conclusion
In this article, I explained the posexplode_outer() function in PySpark with examples using arrays and maps.
You learned how it differs from posexplode(), when to use it, and how it helps retain null or empty array/map rows.
Use posexplode_outer() when working with nested data structures where preserving all rows, including nulls, is important.
Happy Learning!!
Reference
Related Articles
- PySpark explode_outer() Function
- PySpark MapType (Dict) Usage with Examples
- PySpark ArrayType Column With Examples
- PySpark explode() Function
- PySpark – explode nested array into rows