Spark – How to create an empty DataFrame?

  • Post Author:
  • Post Category:Apache Spark

In this article, I will explain how to create empty Spark DataFrame with several Scala examples. Below I have explained one of the many scenarios where we need to create empty DataFrame.

While working with files, some times we may not receive a file for processing, however, we still need to create a DataFrame similar to the DataFrame we create when we receive a file. If we don’t create with the same schema, our operations/transformations on DF fail as we refer to the columns that may not present.

To handle situations similar to these, we always need to create a DataFrame with the same schema, which means the same column names and datatypes regardless of the file exists or empty file processing.

First let’s create the schema, columns and case class which I will use in the rest of the article.


  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  import spark.implicits._

  val schema = StructType(
    StructField("firstName", StringType, true) ::
      StructField("lastName", IntegerType, false) ::
      StructField("middleName", IntegerType, false) :: Nil)

  val colSeq = Seq("firstName","lastName","middleName")

  case class Name(firstName: String, lastName: String, middleName:String)

Creating an empty DataFrame (Spark 2.x and above)

SparkSession provides an emptyDataFrame() method, which returns the empty DataFrame with empty schema, but we wanted to create with the specified StructType schema.

val df = spark.emptyDataFrame

Create empty DataFrame with schema (StructType)

Use createDataFrame() from SparkSession


val df = spark.createDataFrame(spark.sparkContext
      .emptyRDD[Row], schema)

Using implicit encoder

Let’s see another way, which uses implicit encoders.


Seq.empty[(String,String,String)].toDF(colSeq:_*)

Using case class

We can also create empty DataFrame with the schema we wanted from the scala case class.


Seq.empty[Name].toDF()

All examples above have the below schema with zero records in DataFrame.


root
 |-- firstName: string (nullable = true)
 |-- lastName: string (nullable = true)
 |-- middleName: string (nullable = true)

Happy Learning !!

NNK

SparkByExamples.com is a BigData and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment using Scala and Maven.

Leave a Reply