Spark - Check if DataFrame or Dataset is empty?

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:Apache Spark
Post last modified:February 13, 2023
Reading time:3 mins read

You are currently viewing Spark – Check if DataFrame or Dataset is empty?

In Spark, isEmpty of the DataFrame class is used to check if the DataFrame or Dataset is empty, this returns true when empty otherwise return false. Besides this, Spark also has multiple ways to check if DataFrame is empty. In this article, I will explain all different ways and compare these with the performance see which one is best to use.

First, let’s create an empty DataFrame


val df = spark.emptyDataFrame

Using isEmpty of the DataFrame or Dataset

isEmpty function of the DataFrame or Dataset returns true when the dataset empty and false when it’s not empty.


df.isEmpty

Alternatively, you can also check for DataFrame empty.


df.head(1).isEmpty

Note that calling df.head() and df.first() on empty DataFrame returns java.util.NoSuchElementException: next on empty iterator exception.

You can also use the below but this is not efficient as above hence use it wisely when you have a small dataset. df.count calculates the count from all partitions from all nodes hence do not use it when you have millions of records.


print(df.count > 0)

1. Using isEmpty of the RDD

This is most performed way of check if DataFrame or Dataset is empty.


// Using isEmpty of the RDD
df.rdd.isEmpty()

Conclusion

In Summary, we can check the Spark DataFrame empty or not by using isEmpty function of the DataFrame, Dataset and RDD. if you have performance issues calling it on DataFrame, you can try using df.rdd.isempty

Happy Learning !!

Using isEmpty of the DataFrame or Dataset

1. Using isEmpty of the RDD

Conclusion

Related Articles

Leave a Reply