Problem: How to explode the Array of Map DataFrame columns to rows using Spark.
Solution: Spark explode function can be used to explode an Array of Map ArrayType(MapType)
columns to rows on Spark DataFrame using scala example.
Before we start, let’s create a DataFrame with map column in an array. From below example column “properties” is an array of MapType which holds properties of a person with key & value pair.
val arrayMapSchema = new StructType().add("name",StringType)
.add("properties",ArrayType(new MapType(StringType,StringType,true)))
val arrayMapData = Seq(
Row("James",List(Map("hair"->"black","eye"->"brown"), Map("height"->"5.9"))),
Row("Michael",List(Map("hair"->"brown","eye"->"black"),Map("height"->"6"))),
Row("Robert",List(Map("hair"->"red","eye"->"gray"),Map("height"->"6.3")))
)
val df = spark.createDataFrame(
spark.sparkContext.parallelize(arrayMapData),arrayMapSchema)
df.printSchema()
df.show(false)
df.printSchema() and df.show() returns the following schema and table.
root
|-- name: string (nullable = true)
|-- properties: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
+-------+------------------------------------------------+
|name |properties |
+-------+------------------------------------------------+
|James |[[hair -> black, eye -> brown], [height -> 5.9]]|
|Michael|[[hair -> brown, eye -> black], [height -> 6]] |
|Robert |[[hair -> red, eye -> gray], [height -> 6.3]] |
+-------+------------------------------------------------+
Now, let’s explode “properties” array column to map rows. after exploding, it creates a new column ‘col’ with rows represents a map.
import spark.implicits._
val df2 = df.select($"name",explode($"properties"))
df2.printSchema()
df2.show(false)
Outputs:
// Outputs:
root
|-- name: string (nullable = true)
|-- col: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
+-------+-----------------------------+
|name |col |
+-------+-----------------------------+
|James |[hair -> black, eye -> brown]|
|James |[height -> 5.9] |
|Michael|[hair -> brown, eye -> black]|
|Michael|[height -> 6] |
|Robert |[hair -> red, eye -> gray] |
|Robert |[height -> 6.3] |
+-------+-----------------------------+
Complete Example
// Complete Example
package com.sparkbyexamples.spark.dataframe.functions.collection
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.functions.{explode}
import org.apache.spark.sql.types._
object ArrayOfMapType extends App {
val spark = SparkSession.builder().appName("SparkByExamples.com")
.master("local[1]")
.getOrCreate()
val arrayMapSchema = new StructType().add("name",StringType)
.add("properties",
ArrayType(new MapType(StringType,StringType,true)))
val arrayMapData = Seq(
Row("James",List(Map("hair"->"black","eye"->"brown"), Map("height"->"5.9"))),
Row("Michael",List(Map("hair"->"brown","eye"->"black"),Map("height"->"6"))),
Row("Robert",List(Map("hair"->"red","eye"->"gray"),Map("height"->"6.3")))
)
val df = spark.createDataFrame(
spark.sparkContext.parallelize(arrayMapData),arrayMapSchema)
df.printSchema()
df.show(false)
import spark.implicits._
val df2 = df.select($"name",explode($"properties"))
df2.printSchema()
df2.show(false)
}
Happy Learning !!