Read Snowflake table into Spark DataFrame

In this Snowflake data warehouse article, I will explain how to read a Snowflake table into Spark DataFrame and learn different connection properties with the Scala language.

Maven Dependency

<dependency>
     <groupId>net.snowflake</groupId>
     <artifactId>spark-snowflake_2.11</artifactId>
     <version>2.5.9-spark_2.4</version>
</dependency>

Spark Connection parameters

To establish a connection from Spark to Snowflake, we need to provide the following connection properties using Spark options.

sfURL : URL of your account for e.g https://oea82.us-east-1.snowflakecomputing.com/
sfAccount : You account name, you can get this from URL for e.g “oea82”
sfUser : Snowflake user name, typically your login user
sfPassword : user password
sfWarehouse : Snowflake Data warehouse name
sfDatabase : Snowflake Database name
sfSchema : Database schema where your table belongs
sfRole : Snowflake user role
and more

Read Snowflake table into Spark DataFrame

By using the read() method (which is DataFrameReader object) of the SparkSession and using below methods

Use format() to specify the data source name either snowflake or net.snowflake.spark.snowflake

Use Option() to specify the above-discussed connection parameters like URL, account, username, password, database name, schema, role and more.

Use dbtable option to specify the Snowflake table name you wanted to read from or use query option to execute the specific query.


package com.sparkbyexamples.spark

import org.apache.spark.sql.{DataFrame, SparkSession}

object ReadEmpFromSnowflake extends App{

  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate();

  var sfOptions = Map(
    "sfURL" -> "https://oea82.us-east-1.snowflakecomputing.com/",
    "sfAccount" -> "oea82",
    "sfUser" -> "user",
    "sfPassword" -> "#############",
    "sfDatabase" -> "EMP",
    "sfSchema" -> "PUBLIC",
    "sfRole" -> "ACCOUNTADMIN"
  )

  val df: DataFrame = spark.read
    .format("net.snowflake.spark.snowflake") // or just use "snowflake"
    .options(sfOptions)
    .option("dbtable", "EMPLOYEE")
    .load()

  df.show(false)
}

This yields below output


+-------+----------+------+
|NAME   |DEPARTMENT|SALARY|
+-------+----------+------+
|James  |Sales     |3000  |
|Michael|Sales     |4600  |
|Robert |Sales     |4100  |
|Maria  |Finance   |3000  |
|Raman  |Finance   |3000  |
|Scott  |Finance   |3300  |
|Jen    |Finance   |3900  |
|Jeff   |Marketing |3000  |
|Kumar  |Marketing |2000  |
+-------+----------+------+

Above example demonstrates reading the entire table from the Snowflake table using dbtable option and creating a Spark DataFrame, below example uses a query option to execute a group by aggregate SQL query.


  val df1: DataFrame = spark.read
    .format("net.snowflake.spark.snowflake")
    .options(sfOptions)
    .option("query", "select department, sum(salary) as total_salary from EMPLOYEE group by department")
    .load()
df1.show(false)

This yields the below output.


+----------+------------+
|DEPARTMENT|TOTAL_SALARY|
+----------+------------+
|Sales     |11700       |
|Finance   |13200       |
|Marketing |5000        |
+----------+------------+

This Spark Snowflake connector scala example is also available at GitHub project ReadEmpFromSnowflake

Complete example


package com.sparkbyexamples.spark

import org.apache.spark.sql.{DataFrame, SparkSession}

object ReadEmpFromSnowflake extends App{

  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate();

  var sfOptions = Map(
    "sfURL" -> "https://oea81082.us-east-1.snowflakecomputing.com/",
    "sfAccount" -> "oea81082",
    "sfUser" -> "sfusername",
    "sfPassword" -> "#####1",
    "sfDatabase" -> "EMP",
    "sfSchema" -> "PUBLIC",
    "sfRole" -> "ACCOUNTADMIN"
  )

  val df: DataFrame = spark.read
    .format("net.snowflake.spark.snowflake")
    .options(sfOptions)
    .option("dbtable", "EMPLOYEE")
    .load()

  df.show(false)

  val df1: DataFrame = spark.read
    .format("net.snowflake.spark.snowflake")
    .options(sfOptions)
    .option("query", "select department, sum(salary) as total_salary from EMPLOYEE group by department")
    .load()

  df1.show(false)
}

This Spark Snowflake connector scala example is also available at GitHub project ReadEmpFromSnowflake

Confluence

In this tutorial, you have learned how to read a Snowflake table and write it to Spark DataFrame and also learned different options to use to connect to Snowflake table.

Happy Learning !!

Maven Dependency

Spark Connection parameters

Read Snowflake table into Spark DataFrame

Complete example

Confluence

Related Articles