Spark – How to Convert Map into Multiple Columns

  • Post author:
  • Post category:Apache Spark
  • Post last modified:October 30, 2022

In this Spark DataFrame article, I will explain how to convert the map (MapType) column into multiple columns (one column for each map key) using a Scala example.

Spark supports multiple map functions to get the keys and values of the map columns and also has few methods on column class to work with MapTypes. Let’s see these functions with examples.

Before we proceed with an example of how to convert map type column into multiple columns, first, let’s create a DataFrame.


  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()
  import spark.implicits._
  //Creating DF with MapType
  val arrayStructureData = Seq(
    Row("James",Map("hair"->"black","eye"->"brown")),
    Row("Michael",Map("hair"->"gray","eye"->"black")),
    Row("Robert",Map("hair"->"brown"))
  )

  val mapType  = DataTypes.createMapType(StringType,StringType)

  val arrayStructureSchema = new StructType()
    .add("name",StringType)
    .add("property", MapType(StringType,StringType))

  val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
  mapTypeDF.printSchema()
  mapTypeDF.show(false)

This yields output as below


root
 |-- name: string (nullable = true)
 |-- property: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-------+-----------------------------+
|name   |property                     |
+-------+-----------------------------+
|James  |[hair -> black, eye -> brown]|
|Michael|[hair -> gray, eye -> black] |
|Robert |[hair -> brown]              |
+-------+-----------------------------+

From above DataFrame, Let’s convert the map values of the property column into individual columns and name them “hair_color” and “eye_color

By using getItem() of the org.apache.spark.sql.Column class we can get the value of the map key. This method takes a map key string as a parameter. By using this let’s extract the values for each key from the map.

so In order to use this function, you need to know the keys you wanted to extract from a MapType column.


 mapTypeDF.select(col("name"),
   col("property").getItem("hair").as("hair_color"),
   col("property").getItem("eye").as("eye_color"))
   .show(false)

This yields below output with new columns hair_color and eye_color


+-------+----------+---------+
|name   |hair_color|eye_color|
+-------+----------+---------+
|James  |black     |brown    |
|Michael|gray      |black    |
|Robert |brown     |null     |
+-------+----------+---------+

Note that if a key is not present on any row, getItem() function returns a null value.

In case if your keys are dynamic and you wanted to automatically find all keys and convert into columns.


  import spark.implicits._
  val keysDF = mapTypeDF.select(explode(map_keys($"property"))).distinct()
  val keys = keysDF.collect().map(f=>f.get(0))
  val keyCols = keys.map(f=> col("property").getItem(f).as(f.toString))
  mapTypeDF.select(col("name") +: keyCols:_*).show(false)

This yields below output


+-------+-----+-----+
|name   |hair |eye  |
+-------+-----+-----+
|James  |black|brown|
|Michael|gray |black|
|Robert |brown|null |
+-------+-----+-----+

Complete Example

Below is a complete example of how to convert Spark DataFrame Map MapType column into multiple columns.


package com.sparkbyexamples.spark.dataframe.functions.collection

import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types._

object MapToColumn extends App {

  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  val arrayStructureData = Seq(
    Row("James",Map("hair"->"black","eye"->"brown")),
    Row("Michael",Map("hair"->"gray","eye"->"black")),
    Row("Robert",Map("hair"->"brown"))
  )

  val mapType  = DataTypes.createMapType(StringType,StringType)

  val arrayStructureSchema = new StructType()
    .add("name",StringType)
    .add("property", MapType(StringType,StringType))

  val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
  mapTypeDF.printSchema()
  mapTypeDF.show(false)

 mapTypeDF.select(col("name"),
   col("property").getItem("hair").as("hair_color"),
   col("property").getItem("eye").as("eye_color"))
   .show(false)

  import spark.implicits._
  val keysDF = mapTypeDF.select(explode(map_keys($"property"))).distinct()
  val keys = keysDF.collect().map(f=>f.get(0))
  val keyCols = keys.map(f=> col("property").getItem(f).as(f.toString))
  mapTypeDF.select(col("name") +: keyCols:_*).show(false)
}

This example is also available at Spark Example Github project for reference.

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has 2 Comments

  1. Guy

    Thanks for that. I feel like “pivot” should be one of the answers, or at least explain it can be done using this mechanism and reference the relevant page.

  2. Satya

    Super helpful!!

You are currently viewing Spark – How to Convert Map into Multiple Columns