In PySpark, the posexplode() function is used to explode an array or map column into multiple rows, just like explode(), but with an additional positional index column. This index column represents the position of each element in the array (starting from 0), which is useful for tracking element order or performing position-based operations.
The posexplode() function is part of the pyspark.sql.functions module and is commonly used when working with arrays, maps, structs, or nested JSON data.
Key Points-
posexplode()creates a new row for each element of an array or key-value pair of a map.- It adds a position index column (
pos) showing the element’s position within the array. - When used with arrays, it returns two columns:
posandcol. - When used with maps, it returns
pos,key, andvalue. - Rows with
nullor empty arrays are removed by default. - Use
posexplode_outer()to retain rows even when arrays or maps are null or empty. - Ideal for flattening complex or nested data while retaining element order.
PySpark posexplode() Function
The PySpark posexplode() function generates a new row for each element in an array or map along with its position. By default, it assigns the column name pos to represent the element’s position and col for the element itself when used with arrays, or key and value when used with maps, unless you specify custom names.
Syntax
Following is the syntax of the poexplode() function.
# Syntax of the posexplode()
from pyspark.sql.functions import posexplode
posexplode(col)
Parameters
col:The column name or expression containing an array or map to be exploded.
Return Value
It returns a new column (or multiple columns) with each row representing an array element or map key-value pair, additionally, its position.
Let’s start with a sample DataFrame containing arrays and maps.
# Create SparkSession and Prepare sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col
spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()
arrayData = [
('James', ['Java', 'Scala'], {'hair': 'black', 'eye': 'brown'}),
('Michael', ['Spark', 'Java', None], {'hair': 'brown', 'eye': None}),
('Robert', ['CSharp', ''], {'hair': 'red', 'eye': ''}),
('Washington', None, None),
('Jefferson', ['1', '2'], {})
]
df = spark.createDataFrame(data=arrayData, schema=['name', 'knownLanguages', 'properties'])
df.printSchema()
df.show(truncate=False)
Yields below the output.

PySpark posexplode() on Array Column
You can use the posexplode() function on an array column to generate new rows, each containing the element’s index position (pos) and its value (col) in separate columns.
# Posexplode on an array column
df_pos = df.select(df.name, posexplode(df.knownLanguages))
df_pos.show(truncate=False)
Yields below the output.
PySpark posexplode() on Map Column
You can apply the posexplode() function to a map column in a DataFrame to transform each key-value pair into individual rows. By default, it generates three columns: pos (position), key, and value, unless custom aliases are provided.
# Posexplode map column
df_pos = df.select(df.name, posexplode(df.properties).alias("key", "value"))
df_pos.show(truncate=False)
Yields below the output.
# Output:
+-------+---+----+-----+
|name |pos|key |value|
+-------+---+----+-----+
|James |0 |eye |brown|
|James |1 |hair|black|
|Michael|0 |eye |NULL |
|Michael|1 |hair|brown|
|Robert |0 |eye | |
|Robert |1 |hair|red |
+-------+---+----+-----+
This is an ideal PySpark posexplode map example, useful when you need to maintain both order and structure of map-type data.
PySpark posexplode_outer()
When your dataset contains null or empty arrays, posexplode() skips those rows. To retain null rows, use posexplode_outer() instead.
# PySpark posexplode_outer() to get null values
from pyspark.sql.functions import posexplode_outer
df_outer = df.select(df.name, posexplode_outer(df.knownLanguages))
df_outer.show(truncate=False)
Yields below the output.
# Output:
+----------+----+------+
|name |pos |col |
+----------+----+------+
|James |0 |Java |
|James |1 |Scala |
|Michael |0 |Spark |
|Michael |1 |Java |
|Michael |2 |NULL |
|Robert |0 |CSharp|
|Robert |1 | |
|Washington|NULL|NULL |
|Jefferson |0 |1 |
|Jefferson |1 |2 |
+----------+----+------+
PySpark posexplode() JSON Column
You can also apply posexplode() after parsing a JSON column. Here’s an example of PySpark posexplode JSON array using from_json().
# PySpark posexplode() JSON Column
from pyspark.sql.functions import from_json, schema_of_json
json_schema = schema_of_json('{"lang":["Python","Java"],"level":"Intermediate"}')
data_json = [("James", '{"lang":["Python","Java"],"level":"Intermediate"}')]
df_json = spark.createDataFrame(data_json, ["name", "json_data"])
df_parsed = df_json.withColumn("parsed", from_json(col("json_data"), json_schema))
df_exploded_json = df_parsed.select("name", posexplode(col("parsed.lang")).alias("pos", "language"))
df_exploded_json.show(truncate=False)
Yields below the output.
# Output:
+-----+---+--------+
|name |pos|language|
+-----+---+--------+
|James|0 |Python |
|James|1 |Java |
+-----+---+--------+
This gives both the element and its position inside the JSON array.
Compare explode() vs posexplode()
The table below highlights the key differences between explode() and posexplode() in PySpark:
| Function | Description |
|---|---|
| explode() | Generates a new row for each element in an array or map, but does not include the position of elements. |
| posexplode() | Similar to explode(), but adds an additional column indicating the position (index) of each element in the array or map. |
Frequently Asked Questions of PySpark posexplode()
It’s used to flatten arrays or maps while retaining each element’s index position, which is not available in explode().
posexplode() includes an additional positional column (pos), while explode() only returns the value.
Use posexplode_outer() to retain null or empty rows.
you can apply multiple posexplode() functions inside a single select() for flattening multiple columns simultaneously.
For arrays: pos, col
For maps: pos, key, value
You can rename them using .alias().
Conclusion
In this article, you learned how to use PySpark posexplode() to flatten arrays and maps into multiple rows while retaining each element’s position index.
We also covered:
- Using posexplode() on arrays and maps
- posexplode_outer() function
- Applying posexplode() on JSON data
- Comparison with explode()
The posexplode() function is particularly valuable when the order of elements matters, such as in sequence-based data or position-dependent structures.
Happy Learning!!
Reference
Related Articles
- PySpark – explode nested array into rows
- PySpark MapType (Dict) Usage with Examples
- PySpark Convert Dictionary/Map to Multiple Columns
- PySpark ArrayType Column With Examples
- PySpark map() Transformation
- PySpark array_contains() function with examples.
- Explain PySpark element_at() with Examples
- Iterate over Elements of Array in PySpark DataFrame
- Explain PySpark explode_outer() Function