Spark Check Column Data Type is Integer or String

When you are dealing with large datasets with different data types (DataType) in Spark we often need to check the data type of a DataFrame column and even sometimes you need to get all integer, string type columns to perform certain operations.

Related: Different Ways to Get All Column Names & Data Types in Spark

Below is a quick snippet of how to check if a DataFrame column date type is Integer(int) or String in Spark. If you wanted to get all column names of integer, string or any specific type read through the complete article for examples.

// Check if a'name' column type is string
if (df.schema("name").dataType.typeName == "string")
   println(" name is 'string' column")

// Check if a'id' column type is integer  
if (df.schema("id").dataType.typeName == "integer")
   println(" id is 'integer' column")

Let’s see with the detailed example. First let’s create a DataFrame.

import spark.implicits._
val data = Seq((1,"Jeff","2012-04-14",2.34),
val df = data.toDF("id","name","dob","grade")

// root
// |-- id: integer (nullable = false)
// |-- name: string (nullable = true)
// |-- dob: string (nullable = true)
// |-- grade: double (nullable = false)

1. Check Data Type of DataFrame Column

To check the column type of a DataFrame specific column use df.schema which returns all column names and types, now get the column type by name which returns the type. Refer to Spark Convert DataFrame Column Data Type

// Check Data Type of DataFrame Column
if (df.schema("name").dataType.typeName == "string")
   println(" name is 'string' column")
if (df.schema("id").dataType.typeName == "integer")
   println(" id is 'integer' column")

2. Select All Column Names of String Type

Some times you may want to replace all string type columns with a specific value, for example, replace an empty string with a null value in Spark, in order to do so you can use df.schema.fields to get all DataFrame columns and apply a filter to get only string columns.

// Select All Column Names of String Type
import org.apache.spark.sql.functions.{col}
val stringColumns=df.schema.fields.filter(_.dataType.isInstanceOf[StringType])>col(*).show()

// +-----+----------+
// | name|       dob|
// +-----+----------+
// | Jeff|2012-04-14|
// |  Ram|2012-04-14|
// |Scott|2012-04-14|
// +-----+----------+

Alternatively you can also get using.

// Get All String Columns
val stringColumns1=df.schema.fields
      .filter(_.dataType.typeName == "string")>col(*).show()

And another way to get all columns of string type using df.dtypes.

val stringColumns2=df.dtypes.filter(_._2 == "StringType")>col(x._1)):_*).show()

3. Select All Column Names of Integer Type

If you wanted to know all column names of Integer types use the below example. This would typically need if you wanted to replace all integer columns with specific values e.t.c.

// Get All Integer Columns
val integerColumns=df.schema.fields

// +---+
// | id|
// +---+
// |  1|
// |  2|
// |  3|
// +---+


In this article, you have learned how to check if a data type of a column is a string, integer, or any other type and also learned how to select all string, integer columns using Spark with Scala examples. You would need to find all string/integer columns to replace values for a specific type of column for example replacing all empty strings with null on all string columns e.t.c.

Happy Learning !!

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Spark Check Column Data Type is Integer or String