You are currently viewing Spark – How to Convert Map into Multiple Columns

In this Spark DataFrame article, I will explain how to convert the map (MapType) column into multiple columns (one column for each map key) using a Scala example.

Spark supports multiple map functions to get the keys and values of the map columns and also has few methods on column class to work with MapTypes. Let’s see these functions with examples.

Before we proceed with an example of how to convert map type column into multiple columns, first, let’s create a DataFrame.


  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()
  import spark.implicits._
  // Creating DF with MapType
  val arrayStructureData = Seq(
    Row("James",Map("hair"->"black","eye"->"brown")),
    Row("Michael",Map("hair"->"gray","eye"->"black")),
    Row("Robert",Map("hair"->"brown"))
  )

  val mapType  = DataTypes.createMapType(StringType,StringType)

  val arrayStructureSchema = new StructType()
    .add("name",StringType)
    .add("property", MapType(StringType,StringType))

  val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
  mapTypeDF.printSchema()
  mapTypeDF.show(false)

This yields output as below


// Output:
root
 |-- name: string (nullable = true)
 |-- property: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-------+-----------------------------+
|name   |property                     |
+-------+-----------------------------+
|James  |[hair -> black, eye -> brown]|
|Michael|[hair -> gray, eye -> black] |
|Robert |[hair -> brown]              |
+-------+-----------------------------+

From above DataFrame, Let’s convert the map values of the property column into individual columns and name them “hair_color” and “eye_color

By using getItem() of the org.apache.spark.sql.Column class we can get the value of the map key. This method takes a map key string as a parameter. By using this let’s extract the values for each key from the map.

so In order to use this function, you need to know the keys you wanted to extract from a MapType column.


 mapTypeDF.select(col("name"),
   col("property").getItem("hair").as("hair_color"),
   col("property").getItem("eye").as("eye_color"))
   .show(false)

This yields below output with new columns hair_color and eye_color


// Output:
+-------+----------+---------+
|name   |hair_color|eye_color|
+-------+----------+---------+
|James  |black     |brown    |
|Michael|gray      |black    |
|Robert |brown     |null     |
+-------+----------+---------+

Note that if a key is not present on any row, getItem() function returns a null value.

In case if your keys are dynamic and you wanted to automatically find all keys and convert into columns.


  import spark.implicits._
  val keysDF = mapTypeDF.select(explode(map_keys($"property"))).distinct()
  val keys = keysDF.collect().map(f=>f.get(0))
  val keyCols = keys.map(f=> col("property").getItem(f).as(f.toString))
  mapTypeDF.select(col("name") +: keyCols:_*).show(false)

This yields below output


// Output:
+-------+-----+-----+
|name   |hair |eye  |
+-------+-----+-----+
|James  |black|brown|
|Michael|gray |black|
|Robert |brown|null |
+-------+-----+-----+

Complete Example

Below is a complete example of how to convert Spark DataFrame Map MapType column into multiple columns.


package com.sparkbyexamples.spark.dataframe.functions.collection

import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types._

object MapToColumn extends App {

  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  val arrayStructureData = Seq(
    Row("James",Map("hair"->"black","eye"->"brown")),
    Row("Michael",Map("hair"->"gray","eye"->"black")),
    Row("Robert",Map("hair"->"brown"))
  )

  val mapType  = DataTypes.createMapType(StringType,StringType)

  val arrayStructureSchema = new StructType()
    .add("name",StringType)
    .add("property", MapType(StringType,StringType))

  val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
  mapTypeDF.printSchema()
  mapTypeDF.show(false)

 mapTypeDF.select(col("name"),
   col("property").getItem("hair").as("hair_color"),
   col("property").getItem("eye").as("eye_color"))
   .show(false)

  import spark.implicits._
  val keysDF = mapTypeDF.select(explode(map_keys($"property"))).distinct()
  val keys = keysDF.collect().map(f=>f.get(0))
  val keyCols = keys.map(f=> col("property").getItem(f).as(f.toString))
  mapTypeDF.select(col("name") +: keyCols:_*).show(false)
}

This example is also available at Spark Example Github project for reference.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 2 Comments

  1. Guy

    Thanks for that. I feel like “pivot” should be one of the answers, or at least explain it can be done using this mechanism and reference the relevant page.

  2. Satya

    Super helpful!!

Comments are closed.