Spark SQL DataType
class is a base class of all data types in Spark which defined in a package org.apache.spark.sql.types.DataType
and they are primarily used while working on DataFrames, In this article, you will learn different Data Types and their utility methods with Scala examples.
1. Spark SQL DataType – base class of all Data Types
All data types from the below table are supported in Spark SQL and DataType
class is a base class for all these. For some types like IntegerType
, DecimalType
, ByteType
e.t.c are subclass of NumericType
which is a subclass of DataType.
StringType | ShortType |
ArrayType | IntegerType |
MapType | LongType |
StructType | FloatType |
DateType | DoubleType |
TimestampType | DecimalType |
BooleanType | ByteType |
CalendarIntervalType | HiveStringType |
BinaryType | ObjectType |
NumericType | NullType |
1.1 DataType common methods
All Spark SQL Data Types extends DataType
class and should provide implementation to the methods explained in this example.
val arr = ArrayType(IntegerType,false)
println("json() : "+arrayType.json) // Represents json string of datatype
println("prettyJson() : "+arrayType.prettyJson) // Gets json in pretty format
println("simpleString() : "+arrayType.simpleString) // simple string
println("sql() : "+arrayType.sql) // SQL format
println("typeName() : "+arrayType.typeName) // type name
println("catalogString() : "+arrayType.catalogString) // catalog string
println("defaultSize() : "+arrayType.defaultSize) // default size
Yields below output.
json() : {"type":"array","elementType":"string","containsNull":true}
prettyJson() : {
"type" : "array",
"elementType" : "string",
"containsNull" : true
}
simpleString() : array<string>
sql() : ARRAY<STRING>
typeName() : array
catalogString() : array<string>
defaultSize() : 20
Besides these, the DataType
class has the following static methods.
1.2 DataType.fromJson()
If you have a JSON string and you wanted to convert to a DataType use fromJson()
. For example you wanted to convert JSON schema from a string to StructType.
val typeFromJson = DataType.fromJson(
"""{"type":"array",
|"elementType":"string","containsNull":false}""".stripMargin)
println(typeFromJson.getClass)
val typeFromJson2 = DataType.fromJson("\"string\"")
println(typeFromJson2.getClass)
//This prints
class org.apache.spark.sql.types.ArrayType
class org.apache.spark.sql.types.StringType$
1.3 DataType.fromDDL()
Like loading structure from JSON string, we can also create it fromDDL()
,
val ddlSchemaStr = "`fullName` STRUCT<`first`: STRING, `last`: STRING," +
"`middle`: STRING>,`age` INT,`gender` STRING"
val ddlSchema = DataType.fromDDL(ddlSchemaStr)
println(ddlSchema.getClass)
// This prints
class org.apache.spark.sql.types.StructType
1.4 DataType.canWrite()
1.5 DataType.equalsStructurally()
2. Use Spark SQL DataTypes class to get a type object
In order to get or create a specific data type, we should use the objects and factory methods provided by org.apache.spark.sql.types.DataTypes
class. for example, use object DataTypes.StringType
to get StringType
and the factory method DataTypes.createArrayType(StirngType)
to get ArrayType of string.
//Below are some examples
val strType = DataTypes.StringType
val arrayType = DataTypes.createArrayType(StringType)
val structType = DataTypes.createStructType(
Array(DataTypes.createStructField("fieldName",StringType,true)))
3. StringType
StringType “org.apache.spark.sql.types.StringType
” is used to represent string values, To create a string type use either DataTypes.StringType
or StringType()
, both of these returns object of String type.
val strType = DataTypes.StringType
println("json : "+strType.json)
println("prettyJson : "+strType.prettyJson)
println("simpleString : "+strType.simpleString)
println("sql : "+strType.sql)
println("typeName : "+strType.typeName)
println("catalogString : "+strType.catalogString)
println("defaultSize : "+strType.defaultSize)
Outputs
json : "string"
prettyJson : "string"
simpleString : string
sql : STRING
typeName : string
catalogString : string
defaultSize : 20
4. ArrayType
Use ArrayType to represent arrays in a DataFrame and use either factory method DataTypes.createArrayType()
or ArrayType()
constructor to get an array object of a specific type.
On Array type object you can access all methods defined in section 1.1 and additionally, it provides containsNull()
, elementType()
, productElement()
to name a few.
val arr = ArrayType(IntegerType,false)
val arrayType = DataTypes.createArrayType(StringType,true)
println("containsNull : "+arrayType.containsNull)
println("elementType : "+arrayType.elementType)
println("productElement : "+arrayType.productElement(0))
Yields below output.
containsNull : true
elementType : StringType
productElement : StringType
For more example and usage, please refer Using ArrayType on DataFrame
5. MapType
Use MapType to represent maps with key-value pair in a DataFrame and use either factory method DataTypes.createMapType()
or MapType()
constructor to get a map object of a specific key and value type.
On Map type object you can access all methods defined in section 1.1 and additionally, it provides keyType()
, valueType()
, valueContainsNull()
, productElement()
to name a few.
val mapType1 = MapType(StringType,IntegerType)
val mapType = DataTypes.createMapType(StringType,IntegerType)
println("keyType() : "+mapType.keyType)
println("valueType() : "+mapType.valueType)
println("valueContainsNull() : "+mapType.valueContainsNull)
println("productElement(1) : "+mapType.productElement(1))
Yields below output.
keyType() : StringType
valueType() : IntegerType
valueContainsNull() : true
productElement(1) : IntegerType
For more example and usage, please refer Using MapType on DataFrame
6. DateType
Use DateType “org.apache.spark.sql.types.DataType
” to represent the date on a DataFrame and use either DataTypes.DateType
or DateType()
constructor to get a date object.
On Date type object you can access all methods defined in section 1.1
7. TimestampType
Use TimestampType “org.apache.spark.sql.types.TimestampType
” to represent the time on a DataFrame and use either DataTypes.TimestampType
or TimestampType()
constructor to get a time object.
On Timestamp type object you can access all methods defined in section 1.1
8. StructType
Use StructType “org.apache.spark.sql.types.StructType
” to define the nested structure or schema of a DataFrame, use either DataTypes.createStructType()
or StructType()
constructor to get a struct object.
StructType object provides lot of functions like toDDL()
, fields()
, fieldNames()
, length()
to name few.
//StructType
val structType = DataTypes.createStructType(
Array(DataTypes.createStructField("fieldName",StringType,true)))
val simpleSchema = StructType(Array(
StructField("name",StringType,true),
StructField("id", IntegerType, true),
StructField("gender", StringType, true),
StructField("salary", DoubleType, true)
))
val anotherSchema = new StructType()
.add("name",new StructType()
.add("firstname",StringType)
.add("lastname",StringType))
.add("id",IntegerType)
.add("salary",DoubleType)
For more example and usage, please refer StructType
9. All other remaining Spark SQL Data Types
Similar to the above-described types, for the rest of the datatypes use the appropriate method on DataTypes class or data type constructor to create an object of the desired Data Type, And all common methods described in section 1.1 are available with these types.
Conclusion
In this article, you have learned all different Spark SQL DataTypes, DataType, DataTypes classes and their methods using Scala examples. I would recommend referring to DataType and DataTypes API for more details.
Thanks for reading. If you like it, please do share the article by following the below social links and any comments or suggestions are welcome in the comments sections!
Happy Learning !!
Related Articles
- Spark SQL Explained with Examples
- Spark SQL datediff()
- Spark SQL Create a Table
- Spark SQL like() Using Wildcard Example
- Spark SQL – Select Columns From DataFrame
- Spark SQL Inner Join with Example
- Spark SQL Self Join With Example
Very Informative..!
8. SructType -> 8. StructType
Thanks. Fixed the typo.
Simple but essential examples!
Thanks Chang for your wonderful words 🙂
Clear but essential examples!