Spark SQL provides Encoders
to convert case class to the spark schema (struct StructType object), If you are using older versions of Spark, you can create spark schema from case class using the Scala hack. Both options are explained here with examples.
First, let’s create a case class “Name” & ‘Employee”
case class Name(first:String,last:String,middle:String)
case class Employee(fullName:Name,age:Integer,gender:String)
1. Using Spark Encoders to convert case class to schema
Echoders
class has a method product that takes scala class “Employee” as a type and uses the schema method to return the schema of an Employee class.
// Using Spark Encoders to convert case class to schema
val encoderSchema = Encoders.product[Employee].schema
encoderSchema.printTreeString()
printTreeString() on schema object outputs the below schema.
// Output:
root
|-- fullName: struct (nullable = true)
| |-- first: string (nullable = true)
| |-- last: string (nullable = true)
| |-- middle: string (nullable = true)
|-- age: integer (nullable = true)
|-- gender: string (nullable = true)
2. Using Scala code to create schema from case class
We can also use just scala code without Spark SQL encoders to create spark schema from case class, In order to convert, we would need to use ScalaReflection
class and use schemaFor
// Using Scala code to create schema from case class
import org.apache.spark.sql.catalyst.ScalaReflection
val schema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]
3. Converting SQL DDL String to the schema
Like loading structure from JSON string, we can also create it from DLL, you can also generate DDL from a schema using toDDL()
. printTreeString() on struct object prints the schema similar to printSchema
function returns.
// Converting SQL DDL String to the schema
val ddlSchemaStr = "`fullName` STRUCT,`age` INT,`gender` STRING"
val ddlSchema = StructType.fromDDL(ddlSchemaStr)
ddlSchema.printTreeString()
This complete example is available at Spark-Scala-Examples GitHub project for download
4. Complete Code
package com.sparkbyexamples.spark.dataframe
import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types.StructType
object CaseClassSparkSchema extends App{
case class Name(first:String,last:String,middle:String)
case class Employee(fullName:Name,age:Integer,gender:String)
val encoderSchema = Encoders.product[Employee].schema
encoderSchema.printTreeString()
import org.apache.spark.sql.catalyst.ScalaReflection
val schema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]
}
Happy Learning !!