Spark - explode Array of Map to rows

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:Apache Spark / Spark SQL Functions
Post last modified:March 27, 2024
Reading time:5 mins read

You are currently viewing Spark – explode Array of Map to rows

Problem: How to explode the Array of Map DataFrame columns to rows using Spark.

Solution: Spark explode function can be used to explode an Array of Map ArrayType(MapType) columns to rows on Spark DataFrame using scala example.

Before we start, let’s create a DataFrame with map column in an array. From below example column “properties” is an array of MapType which holds properties of a person with key & value pair.


  val arrayMapSchema = new StructType().add("name",StringType)
    .add("properties",ArrayType(new MapType(StringType,StringType,true)))

  val arrayMapData = Seq(
    Row("James",List(Map("hair"->"black","eye"->"brown"), Map("height"->"5.9"))),
    Row("Michael",List(Map("hair"->"brown","eye"->"black"),Map("height"->"6"))),
    Row("Robert",List(Map("hair"->"red","eye"->"gray"),Map("height"->"6.3")))
  )

  val df = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayMapData),arrayMapSchema)
  df.printSchema()
  df.show(false)

df.printSchema() and df.show() returns the following schema and table.


root
 |-- name: string (nullable = true)
 |-- properties: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)


+-------+------------------------------------------------+
|name   |properties                                      |
+-------+------------------------------------------------+
|James  |[[hair -> black, eye -> brown], [height -> 5.9]]|
|Michael|[[hair -> brown, eye -> black], [height -> 6]]  |
|Robert |[[hair -> red, eye -> gray], [height -> 6.3]]   |
+-------+------------------------------------------------+

Now, let’s explode “properties” array column to map rows. after exploding, it creates a new column ‘col’ with rows represents a map.


  import spark.implicits._

  val df2 = df.select($"name",explode($"properties"))
  df2.printSchema()
  df2.show(false)

Outputs:


// Outputs:
root
 |-- name: string (nullable = true)
 |-- col: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)


+-------+-----------------------------+
|name   |col                          |
+-------+-----------------------------+
|James  |[hair -> black, eye -> brown]|
|James  |[height -> 5.9]              |
|Michael|[hair -> brown, eye -> black]|
|Michael|[height -> 6]                |
|Robert |[hair -> red, eye -> gray]   |
|Robert |[height -> 6.3]              |
+-------+-----------------------------+

Complete Example


// Complete Example
package com.sparkbyexamples.spark.dataframe.functions.collection

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.functions.{explode}
import org.apache.spark.sql.types._

object ArrayOfMapType extends App {
  val spark = SparkSession.builder().appName("SparkByExamples.com")
    .master("local[1]")
    .getOrCreate()

  val arrayMapSchema = new StructType().add("name",StringType)
    .add("properties",
      ArrayType(new MapType(StringType,StringType,true)))

  val arrayMapData = Seq(
    Row("James",List(Map("hair"->"black","eye"->"brown"), Map("height"->"5.9"))),
    Row("Michael",List(Map("hair"->"brown","eye"->"black"),Map("height"->"6"))),
    Row("Robert",List(Map("hair"->"red","eye"->"gray"),Map("height"->"6.3")))
  )

  val df = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayMapData),arrayMapSchema)
  df.printSchema()
  df.show(false)

  import spark.implicits._

  val df2 = df.select($"name",explode($"properties"))
  df2.printSchema()
  df2.show(false)
}

Happy Learning !!

Tags: explode, MapType

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Complete Example

Related Articles

Naveen Nelamali