Spark createOrReplaceTempView() Explained

How does the createOrReplaceTempView() method work in Spark and what is it used for? One of the main advantages of Apache Spark is working with SQL along with DataFrame/Dataset API. So if you are comfortable with SQL, you can create a temporary view on DataFrame/Dataset by using createOrReplaceTempView() and using SQL to select and manipulate the data.

A Temporary view in Spark is similar to a real SQL table that contains rows and columns but the view is not materialized into files. In this article, we will be discussing what is createOrReplaceTempView() and how to use it to create a temporary view and run Spark SQL queries.

Note that throughout this article, I will use a table and view interchangeably.

1. Syntax of createOrReplaceTempView()

Following is the syntax of the createOrReplaceTempView() method. This method is from the Dataset class. So you can use this method from Spark DataFrame and Dataset objects.


// Syntax of createOrReplaceTempView()
createOrReplaceTempView(viewName):Unit

2. How Does createOrReplaceTempView() work in spark?

createOrReplaceTempView() in Spark creates a view only if not exist, if it exits it replaces the existing view with the new one. Spark SQL views are lazily evaluated meaning it does not persist in memory unless you cache the dataset by using the cache() method.

Some KeyPoints to note:

  • createOrReplaceTempView() is used when you wanted to store the table for a specific spark session.
  • Once created you can use it to run SQL queries.
  • These temporary views are session-scoped i.e. valid only that running spark session.
  • It can’t be shared between the sessions
  • These views will be dropped when the session ends unless you created it as Hive table.
  • Use saveAsTable() to materialize the contents of the DataFrame and create a pointer to the data in the metastore.

3. Create a Temporary View

The createOrReplaceTempView() is used to create a temporary view/table from the Spark DataFrame or Dataset objects. Since it is a temporary view, the lifetime of the table/view is tied to the current SparkSession. Hence, It will be automatically removed when your spark session ends.

If you are using an older version prior to Spark 2.0, you can use registerTempTable() to create a temporary table.

Following are the steps to create a temporary view in Spark and access it.

  • Step1: Create a Spark DataFrame
  • Step 2: Convert it to an SQL table (a.k.a view)
  • Step 3: Access view using SQL query

3.1 Create a DataFrame

First, let’s create a Spark DataFrame with columns firstname, lastname, country and state columns.


// Create DataFrame
val spark = SparkSession.builder
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()

val data = Seq(("James","Smith","USA","CA"),
  ("Michael","Rose","USA","NY"),
  ("Robert","Williams","USA","CA"),
  ("Maria","Jones","USA","FL")
  )
val columns = Seq("firstname","lastname","country","state")
import spark.implicits._
val df = data.toDF(columns:_*)
df.show(false)

3.2 Create Table/View using Spark createOrReplaceTempView()

Let’s create a temporary view on top of the DataFrame object by using df.createOrReplaceTempView(). Spark SQL temporary views are session-scoped and will not be available if the session that creates it terminates. The following examples create a Person table.


// Create Temporary View/Table 
df.createOrReplaceTempView("Person")

If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view using createGlobalTempView()

3.3 Access View using Spark SQL Query

Using SparkSession you can access Spark SQL capabilities in Apache Spark. Once you have a temporary view you can run any ANSI SQL queries using spark.sql() method.


// Run SQL Query
spark.sql("select firstname, lastname from Person").show()

Yields below output.


// Output:
+---------+--------+
|firstname|lastname|
+---------+--------+
|    James|   Smith|
|  Michael|    Rose|
|   Robert|Williams|
|    Maria|   Jones|
+---------+--------+

4. Use createOrReplaceTempView() on Azure Databricks

Below is a simple snippet on how to use createOrReplaceTempView() on Azure Databricks and how to access it using Spark SQL query.

createOrReplaceTempView Spark
createOrReplaceTempView in Spark

5. Conclusion

In this article, we discussed how to create a temporary view or table in spark by using createOrReplaceTempView() and explained how it works. Temp views are lazily evaluated views in spark and can be used same as like a table but they are not materialized and don’t have pointers for the data in metastore.

References

rimmalapudi

Data Engineer. I write about BigData Architecture, tools and techniques that are used to build Bigdata pipelines and other generic blogs.

Leave a Reply

This Post Has 2 Comments

  1. Anonymous

    The data is temporary cached in memory or disk?

  2. Alexey

    Thanks! Very helpful. This method works also to exchange data between interpreters in Zeppelin

You are currently viewing Spark createOrReplaceTempView() Explained