In Spark you can get all DataFrame column names and types (DataType) by using df.dttypes
and df.schema
where df
is an object of DataFrame. Let’s see some examples of how to get data type and column name of all columns and data type of selected column by name using Scala examples.
Related: Convert Column Data Type in Spark DataFrame
1. Spark Get All DataType & Column Names
First, let’s see how to get all data types (DataType) & column names using df.dttypes
; where dttypes
returns all Spark DataFrame columns as Array[(String,Stirng)]
. The first value in the tuple is the column name and the second value is a data type.
import spark.sqlContext.implicits._
val df = Seq((1,"Robert"), (2,"Julia")).toDF("id","name")
//Get All column names and it's types
df.dtypes.foreach(f=>println(f._1+","+f._2))
//Prints below column name & data type
//id,IntegerType
//name,StringType
Similarly, you can also get the data type & name of all columns using df.schema
, schema
returns a StructType which is an array of StructField, and by using its methods you can get the column name and its type. df.schema.fields
returns Array[StructField]
.
//Get All column names and it's types
df.schema.fields.foreach(f=>f.name +","+f.dataType)
This yields the same output as above.
2. Get DataType of a Specific Column Name
If you want to get the data type of a specific DataFrame column by name then use the below example.
//Get data type of a specific column
println(df.schema("name").dataType)
//Prints data type of a "name" column
//StringType
3. Get All Column Names
You can get the all columns of a Spark DataFrame by using df.columns
, it returns an array of column names as Array[Stirng]
.
//Get All column names from DataFrame
val allColumnNames=df.columns
println(allColumnNames.mkString(","))
//Print all column names in comma separated string
// id,name
4. Get DataFrame Schema
As you would already know, use df.printSchama()
to display column name and types to console, similarly df.schema.printTreeString()
also prints schema to console.
df.printSchema()
//root
// |-- id: integer (nullable = false)
// |-- name: string (nullable = true)
5. Get Column Nullable Property & Metadata
Let’s see how to get if a column is accepts null values (Nullable) and Metadata of the column.
print(df.schema("name").nullable)
val metaData=df.schema("name").metadata
6. Other ways to get DataFrame Schema
print(df.schema.toDDL)
// `id` INT,`name` STRING
print(df.schema.prettyJson)
//{
// "type" : "struct",
// "fields" : [ {
// "name" : "id",
// "type" : "integer",
// "nullable" : false,
// "metadata" : { }
// }, {
// "name" : "name",
// "type" : "string",
// "nullable" : true,
// "metadata" : { }
// } ]
//}
7. Conclusion
In summary, you can get the names and data type’s (DataType) of all DataFrame column’s by using df.dttypes
and df.schema
and also you can use several StructFeild methods to get the additional details of the Spark DataFrame column.
Happy Learning !!
Related Articles
- Spark Get Current Number of Partitions of DataFrame
- Spark DataFrame count
- Spark groupByKey()
- Spark JDBC Parallel Read
- Spark Query Table using JDBC
- Spark Read and Write MySQL Database Table
- Spark spark.table() vs spark.read.table()
- Spark with SQL Server – Read and Write Table