Spark spark.table() vs spark.read.table()

In Spark or PySpark what is the difference between spark.table() vs spark.read.table()? There is no difference between spark.table() vs spark.read.table() methods and both are used to read the table into Spark DataFrame.

1. spark.table() vs spark.read.table()

There is no difference between spark.table() & spark.read.table() function. Actually, spark.read.table() internally calls spark.table().

I understand this confuses why Spark provides these two syntaxes that do the same. Imagine, spark.read which is object of DataFrameReader provides methods to read several data sources like CSV, Parquet, Text, Avro e.t.c, so it also provides a method to read a table.

2. spark.table() Usage

Here, spark is an object of SparkSession and the table() is a method of SparkSession class which contains the below code snippet.


package org.apache.spark.sql.SparkSession

def table(tableName: String): DataFrame = {
  table(sessionState.sqlParser.parseTableIdentifier(tableName))
}

3. spark.read.table() Usage

Here, spark is an object of SparkSession, read is an object of DataFrameReader and the table() is a method of DataFrameReader class which contains the below code snippet. Notice that inside this method it is calling SparkSession.table() that described above.


package org.apache.spark.sql.DataFrameReader

def table(tableName: String): DataFrame = {
   assertNoSpecifiedSchema("table")
   sparkSession.table(tableName)
}

4. Example Spark Read Table

The below example shows how to read a Hive table to Spark DataFrame by using spark.read.table() and spark.table() methods.

import org.apache.spark.sql.SparkSession object ReadHiveTable extends App { // Create SparkSession with hive enabled val spark = SparkSession.builder().master(“local[*]”) .appName(“SparkByExamples.com”) .enableHiveSupport() .getOrCreate() // Read table using table() val df = spark.read.table(“emp.employee”) df.show() // Read table using table() val df2 = spark.table(“emp.employee”) df2.show() }

Both of these show() methods from above example yields the same output.

5. Conclusion

In this article, you have learned what is the difference between spark.table() vs spark.read.table() methods. As you learned both are exactly the same and are used to read the table into DataFrame.

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply