Spark from_json() – Convert JSON Column to Struct, Map or Multiple Columns

  • Post author:
  • Post category:Apache Spark
  • Post last modified:April 8, 2021

In Spark/PySpark from_json() SQL function is used to convert JSON string from DataFrame column into struct column, Map type, and multiple columns.

1. Spark from_json() Syntax

Following are the different syntaxes of from_json() function.

from_json(Column jsonStringcolumn, Column schema)
from_json(Column jsonStringcolumn, DataType schema)
from_json(Column jsonStringcolumn, StructType schema)
from_json(Column jsonStringcolumn, DataType schema, Map options)
from_json(Column jsonStringcolumn, Column schema, Map options)
from_json(Column jsonStringcolumn, StructType schema, Map options)
and more to support Java util Map
  • jsonStringcolumn – DataFrame column where you have a JSON string.
  • schema – JSON schema, supports either DataType, Column, String, and StructType.
  • options – Optional JSON parsing options. Accepts the same options as JSON data source (

2. Spark from_json() Usage Example

Let’s create a DataFrame with a column contains JSON string and in the next section, I will parse this column and convert it to MapType (map), struct, and multiple columns using the from_json() function.

val jsonString="""{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}"""
val data = Seq((1, jsonString))
import spark.implicits._
val df=data.toDF("id","value")

|id |value                                                                     |
|1  |{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}|

2.1 Spark Convert JSON Column to Map type Column

By using syntax from_json(Column jsonStringcolumn, DataType schema), you can convert Spark DataFrame with JSON string into MapType (map) column. MapType is a subclass of DataType.

import org.apache.spark.sql.functions.{from_json,col}
import org.apache.spark.sql.types.{MapType, StringType}
val df2=df.withColumn("value",from_json(col("value"),MapType(StringType,StringType)))

 |-- id: integer (nullable = false)
 |-- value: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

|id |value                                                                      |
|1  |[Zipcode -> 704, ZipCodeType -> STANDARD, City -> PARC PARQUE, State -> PR]|

Note that by using withColumn() I have converted the value column from JSON string to MapType.

2.1 Spark Convert JSON Column to struct Column

Now by using from_json(Column jsonStringcolumn, StructType schema), you can convert JSON string on the Spark DataFrame column to a struct type. In order to do so, first, you need to create a StructType for the JSON string.

import org.apache.spark.sql.types.{StringType, StructType}
val schema = new StructType()
    .add("Zipcode", StringType, true)
    .add("ZipCodeType", StringType, true)
    .add("City", StringType, true)
    .add("State", StringType, true)

Let’s use this schema on from_json().

val df4=df.withColumn("value",from_json(col("value"),schema))

 |-- id: integer (nullable = false)
 |-- value: struct (nullable = true)
 |    |-- Zipcode: string (nullable = true)
 |    |-- ZipCodeType: string (nullable = true)
 |    |-- City: string (nullable = true)
 |    |-- State: string (nullable = true)

|id |value                           |

2.2 Spark Convert JSON Column to Multiple Columns

Finally, let’s convert the value struct to individual columns.


 |-- id: integer (nullable = false)
 |-- Zipcode: string (nullable = true)
 |-- ZipCodeType: string (nullable = true)
 |-- City: string (nullable = true)
 |-- State: string (nullable = true)

| id|Zipcode|ZipCodeType|       City|State|
|  1|    704|   STANDARD|PARC PARQUE|   PR|

3. PySpark from_json() Syntax

Following is syntax of from_json() syntax.

def from_json(col, schema, options={})

4. PySpark from_json() Usage Example

Since I have already explained how to query and parse JSON string column and convert it to MapType, struct type, and multiple columns above, with PySpark I will just provide the complete example.

from pyspark.sql import SparkSession,Row
spark = SparkSession.builder.appName('').getOrCreate()

#read json from text file
jsonString="""{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}"""
df=spark.createDataFrame([(1, jsonString)],["id","value"])

#Convert JSON string column to Map type
from pyspark.sql.types import MapType,StringType
from pyspark.sql.functions import from_json

#Create schema for JSON
from pyspark.sql.types import StructType,StructField, StringType
schema = StructType([ 
    StructField("State", StringType(), True)

#Convert JSON string column to struct type
from pyspark.sql.functions import col,from_json
df3 = df.withColumn("value",from_json(col("value"),schema))

#Convert to multile columns"id", "value.*")

Happy Learning !!


NNK is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply