Convert Struct to a Map Type in Spark

Spread the love

Let’s say you have the following Spark DataFrame that has StructType (struct) column “properties” and you wanted to convert Struct to Map (MapType) column.


val structureData = Seq( Row("36636","Finance",Row(3000,"USA")), 
    Row("40288","Finance",Row(5000,"IND")), 
    Row("42114","Sales",Row(3900,"USA")), 
    Row("39192","Marketing",Row(2500,"CAN")), 
    Row("34534","Sales",Row(6500,"USA")) ) 
val structureSchema = new StructType().add("id",StringType)
    .add("dept",StringType)
    .add("properties",new StructType()
    .add("salary",IntegerType)
    .add("location",StringType) ) 
var df = spark.createDataFrame( spark.sparkContext.parallelize(structureData),structureSchema)

df.printSchema() df.show(false)

This output the following Spark schema and data frame.


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- properties: struct (nullable = true)
 |    |-- salary: integer (nullable = true)
 |    |-- location: string (nullable = true)

+-----+---------+-----------+
|id   |dept     |properties |
+-----+---------+-----------+
|36636|Finance  |[3000, USA]|
|40288|Finance  |[5000, IND]|
|42114|Sales    |[3900, USA]|
|39192|Marketing|[2500, CAN]|
|34534|Sales    |[6500, USA]|
+-----+---------+-----------+

Now I want to convert StructType column “properties” to a MapType column on DataFrame as shown below.


+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |[salary -> 3000, location -> USA]|
|40288|Finance  |[salary -> 5000, location -> IND]|
|42114|Sales    |[salary -> 3900, location -> USA]|
|39192|Marketing|[salary -> 2500, location -> CAN]|
|34534|Sales    |[salary -> 6500, location -> USA]|
+-----+---------+---------------------------------+

Solution: By using the map() sql function you can create a Map type. In order to convert, first, you need to collect all the columns in a struct type and pass them as a list to this map() function.


val index = df.schema.fieldIndex("properties")
val propSchema = df.schema(index).dataType.asInstanceOf[StructType]
var columns = mutable.LinkedHashSet[Column]()
propSchema.fields.foreach(field =>{
  columns.add(lit(field.name))
  columns.add(col("properties." + field.name))
})

df = df.withColumn("propertiesMap",map(columns.toSeq:_*))
df = df.drop("properties")
df.printSchema()
df.show(false)

This yields below output.


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- propertiesMap: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |[salary -> 3000, location -> USA]|
|40288|Finance  |[salary -> 5000, location -> IND]|
|42114|Sales    |[salary -> 3900, location -> USA]|
|39192|Marketing|[salary -> 2500, location -> CAN]|
|34534|Sales    |[salary -> 6500, location -> USA]|
+-----+---------+---------------------------------+

Happy Learning !!

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Convert Struct to a Map Type in Spark