Convert Struct to a Map Type in Spark

Let’s say you have the following Spark DataFrame that has StructType (struct) column “properties” and you wanted to convert Struct to Map (MapType) column.


val structureData = Seq( Row("36636","Finance",Row(3000,"USA")), 
    Row("40288","Finance",Row(5000,"IND")), 
    Row("42114","Sales",Row(3900,"USA")), 
    Row("39192","Marketing",Row(2500,"CAN")), 
    Row("34534","Sales",Row(6500,"USA")) ) 
val structureSchema = new StructType().add("id",StringType)
    .add("dept",StringType)
    .add("properties",new StructType()
    .add("salary",IntegerType)
    .add("location",StringType) ) 
var df = spark.createDataFrame( spark.sparkContext.parallelize(structureData),structureSchema)

df.printSchema() df.show(false)

This output the following Spark schema and data frame.


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- properties: struct (nullable = true)
 |    |-- salary: integer (nullable = true)
 |    |-- location: string (nullable = true)

+-----+---------+-----------+
|id   |dept     |properties |
+-----+---------+-----------+
|36636|Finance  |[3000, USA]|
|40288|Finance  |[5000, IND]|
|42114|Sales    |[3900, USA]|
|39192|Marketing|[2500, CAN]|
|34534|Sales    |[6500, USA]|
+-----+---------+-----------+

Now I want to convert StructType column “properties” to a MapType column on DataFrame as shown below.


+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |[salary -> 3000, location -> USA]|
|40288|Finance  |[salary -> 5000, location -> IND]|
|42114|Sales    |[salary -> 3900, location -> USA]|
|39192|Marketing|[salary -> 2500, location -> CAN]|
|34534|Sales    |[salary -> 6500, location -> USA]|
+-----+---------+---------------------------------+

Solution: By using the map() sql function you can create a Map type. In order to convert, first, you need to collect all the columns in a struct type and pass them as a list to this map() function.


val index = df.schema.fieldIndex("properties")
val propSchema = df.schema(index).dataType.asInstanceOf[StructType]
var columns = mutable.LinkedHashSet[Column]()
propSchema.fields.foreach(field =>{
  columns.add(lit(field.name))
  columns.add(col("properties." + field.name))
})

df = df.withColumn("propertiesMap",map(columns.toSeq:_*))
df = df.drop("properties")
df.printSchema()
df.show(false)

This yields below output.


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- propertiesMap: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |[salary -> 3000, location -> USA]|
|40288|Finance  |[salary -> 5000, location -> IND]|
|42114|Sales    |[salary -> 3900, location -> USA]|
|39192|Marketing|[salary -> 2500, location -> CAN]|
|34534|Sales    |[salary -> 6500, location -> USA]|
+-----+---------+---------------------------------+

Happy Learning !!

spark convert struct map type

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Convert Struct to a Map Type in Spark