How does the createOrReplaceTempView() method work in Spark and what is it used for? One of the main advantages of Apache Spark is working with SQL along with DataFrame/Dataset API. So if you are comfortable with SQL, you can create a temporary view on DataFrame/Dataset by using
createOrReplaceTempView() and using SQL to select and manipulate the data.
A Temporary view in Spark is similar to a real SQL table that contains rows and columns but the view is not materialized into files. In this article, we will be discussing what is
createOrReplaceTempView() and how to use it to create a temporary view and run Spark SQL queries.
Note that throughout this article, I will use a table and view interchangeably.
1. Syntax of createOrReplaceTempView()
Following is the syntax of the
createOrReplaceTempView() method. This method is from the Dataset class. So you can use this method from Spark DataFrame and Dataset objects.
//Syntax of createOrReplaceTempView() createOrReplaceTempView(viewName):Unit
2. How Does createOrReplaceTempView() work in spark?
createOrReplaceTempView() in Spark creates a view only if not exist, if it exits it replaces the existing view with the new one. Spark SQL views are lazily evaluated meaning it does not persist in memory unless you cache the dataset by using the cache() method.
Some KeyPoints to note:
createOrReplaceTempView()is used when you wanted to store the table for a specific spark session.
- Once created you can use it to run SQL queries.
- These temporary views are session-scoped i.e. valid only that running spark session.
- It can’t be shared between the sessions
- These views will be dropped when the session ends unless you created it as Hive table.
saveAsTable()to materialize the contents of the DataFrame and create a pointer to the data in the metastore.
3. Create a Temporary View
createOrReplaceTempView() is used to create a temporary view/table from the Spark DataFrame or Dataset objects. Since it is a temporary view, the lifetime of the table/view is tied to the current SparkSession. Hence, It will be automatically removed when your spark session ends.
If you are using an older version prior to Spark 2.0, you can use
registerTempTable() to create a temporary table.
Following are the steps to create a temporary view in Spark and access it.
- Step1: Create a Spark DataFrame
- Step 2: Convert it to an SQL table (a.k.a view)
- Step 3: Access view using SQL query
3.1 Create a DataFrame
First, let’s create a Spark DataFrame with columns
// Create DataFrame val spark = SparkSession.builder .master("local") .appName("SparkByExamples.com") .getOrCreate() val data = Seq(("James","Smith","USA","CA"), ("Michael","Rose","USA","NY"), ("Robert","Williams","USA","CA"), ("Maria","Jones","USA","FL") ) val columns = Seq("firstname","lastname","country","state") import spark.implicits._ val df = data.toDF(columns:_*) df.show(false)
3.2 Create Table/View using Spark createOrReplaceTempView()
Let’s create a temporary view on top of the DataFrame object by using
df.createOrReplaceTempView(). Spark SQL temporary views are session-scoped and will not be available if the session that creates it terminates. The following examples create a
// Create Temporary View/Table df.createOrReplaceTempView("Person")
If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view using
3.3 Access View using Spark SQL Query
Using SparkSession you can access Spark SQL capabilities in Apache Spark. Once you have a temporary view you can run any ANSI SQL queries using
// Run SQL Query spark.sql("select firstname, lastname from Person").show()
Yields below output.
//Output +---------+--------+ |firstname|lastname| +---------+--------+ | James| Smith| | Michael| Rose| | Robert|Williams| | Maria| Jones| +---------+--------+
4. Use createOrReplaceTempView() on Azure Databricks
Below is a simple snippet on how to use
createOrReplaceTempView() on Azure Databricks and how to access it using Spark SQL query.
In this article, we discussed how to create a temporary view or table in spark by using createOrReplaceTempView() and explained how it works. Temp views are lazily evaluated views in spark and can be used same as like a table but they are not materialized and don’t have pointers for the data in metastore.
- Spark Create DataFrame with Examples
- Spark Setup with Scala and Run in IntelliJ
- Spark RDD Cache and Persist with Example
- Spark Window Functions with Examples
- Difference in DENSE_RANK and ROW_NUMBER in Spark