In Spark or PySpark what is the difference between spark.table() vs spark.read.table()? There is no difference between spark.table() vs spark.read.table() methods and both are used to read the table into Spark DataFrame.
1. spark.table() vs spark.read.table()
There is no difference between spark.table()
& spark.read.table()
function. Actually, spark.read.table()
internally calls spark.table()
.
I understand this confuses why Spark provides these two syntaxes that do the same. Imagine, spark.read
which is object of DataFrameReader
provides methods to read several data sources like CSV, Parquet, Text, Avro e.t.c, so it also provides a method to read a table.
2. spark.table() Usage
Here, spark
is an object of SparkSession and the table() is a method of SparkSession class which contains the below code snippet.
package org.apache.spark.sql.SparkSession
def table(tableName: String): DataFrame = {
table(sessionState.sqlParser.parseTableIdentifier(tableName))
}
3. spark.read.table() Usage
Here, spark
is an object of SparkSession, read
is an object of DataFrameReader
and the table()
is a method of DataFrameReader class which contains the below code snippet. Notice that inside this method it is calling SparkSession.table()
that described above.
package org.apache.spark.sql.DataFrameReader
def table(tableName: String): DataFrame = {
assertNoSpecifiedSchema("table")
sparkSession.table(tableName)
}
4. Example Spark Read Table
The below example shows how to read a Hive table to Spark DataFrame by using spark.read.table() and spark.table() methods.
import org.apache.spark.sql.SparkSession object ReadHiveTable extends App { // Create SparkSession with hive enabled val spark = SparkSession.builder().master(“local[*]”) .appName(“SparkByExamples.com”) .enableHiveSupport() .getOrCreate() // Read table using table() val df = spark.read.table(“emp.employee”) df.show() // Read table using table() val df2 = spark.table(“emp.employee”) df2.show() }Both of these show() methods from above example yields the same output.
5. Conclusion
In this article, you have learned what is the difference between spark.table() vs spark.read.table() methods. As you learned both are exactly the same and are used to read the table into DataFrame.
Related Articles
- Spark JDBC Parallel Read
- Spark Set JVM Options to Driver & Executors
- Spark Set Environment Variable to Executors
- Spark Schema – Explained with Examples
- Spark SQL Create a Table
- Spark createOrReplaceTempView() Explained
- Spark Get DataType & Column Names of DataFrame
- Spark Enable Hive Support