How to read a JDBC table to Spark DataFrame? Spark provides a spark.sql.DataFraemReader.jdbc()
to read a JDBC table into Spark DataFrame. The usage would be SparkSession.read.jdbc()
, here, read
is an object of DataFrameReader
class and jdbc()
is a method in it.
In this article, I will explain the syntax of jdbc() method, how to connect to the database, and reading a JDBC table to Spark DataFrame by using Spark with MySQL connector.
Steps to use spark.read.jdbc().
- Step 1 – Identify the Spark Connector to use
- Step 2 – Add the dependency
- Step 3 – Create SparkSession
- Step 4 – Read JDBC Table to Spark Dataframe
1. Syntax of DataFrameReader.jdbc()
The DataFrameReader provides several syntaxes of the jdbc() method. You can use any of these based on your need.
// Syntax of jdbc()
jdbc(url: String, table: String, connectionProperties: Properties): DataFrame
jdbc(url: String,table: String,predicates: Array[String],connectionProperties: Properties): DataFrame
jdbc(url: String,table: String,columnName: String,lowerBound: Long,upperBound: Long,numPartitions: Int,connectionProperties: Properties): DataFrame
2. Spark Read JDBC
To read a table using the Spark jdbc() method, you would minimum need a driver, server ip, port, database name, table, user, and port. JDBC is a Java standard to connect to any database as long as you provide the right JDBC connector jar in the classpath and provide a JDBC driver using the JDBC API. Spark also leverages the same JDBC standard when using jdbc() method.
For more JDBC properties refer to https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
The connector I am using in this article is mysql-connector-java
and the driver I am using com.mysql.jdbc.Driver
MySQL provides connectors for each server version hence, please choose the right version if you are also using MySQL. If you are using Maven, add the below dependency to your pom.xml. For my example, I am using Spark 3.2.1 and Scala 2.13 hence I chose the following dependencies along with MySQL connector.
// Spark Read JDBC
<dependencies>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.13</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
3. Spark Read jdbc() Example
I have a MySQL database emp
and table employee
with columns id
, name
, age
and gender
.

I will use the jdbc() method to read this table into Spark DataFrame.
import org.apache.spark.sql.SparkSession
import java.util.Properties
// Create SparkSession in spark 2.x or later
val spark = SparkSession.builder().master("local[*]")
.appName("SparkByExamples.com")
.getOrCreate()
// Connection Properties
val connProp = new Properties()
connProp.setProperty("driver", "com.mysql.cj.jdbc.Driver");
connProp.put("user", "root")
connProp.put("password", "root")
// Read from MySQL Table
val df = spark.read
.jdbc("jdbc:mysql://localhost:3306/emp", "employee", connProp)
// Show DataFrame
df.show()
Yields below output.

Alternatively, you can also use the spark.read.format("jdbc").load()
to read the table. When you use this, you need to provide the database details with option() method.
// Read from MySQL Table
val df = spark.read
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://localhost:3306/emp")
.option("dbtable", "employee")
.option("user", "root")
.option("password", "root")
.load()
4. Select Specific Columns to Read
In the above example, it reads the entire table into Spark DataFrame. Sometimes you may not be required to select the entire table, so to select the specific columns, specify the query you wanted to select with dbtable
option.
// Read from SQL Table
val df = spark.read
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://localhost:3306/emp")
.option("dbtable", "select id,age from employee where gender='M'")
.option("user", "root")
.option("password", "roor")
.load()
df.show()
Alternatively, you can also use the query
option. Note that you can use either dbtable
or query
option but not both at a time. Also, when using the query
option, you can’t use partitionColumn
option.
// Using query
val df = spark.read
.format("jdbc")
.option("query", "select id,age from employee where gender='M'")
.......
.......
.load()
5. Read JDBC Table in Parallel
Use option numPartitions
to read JDBC table in parallel. This property also determines the maximum number of concurrent JDBC connections to use. The below example creates the DataFrame with 5 partitions. I have also used the fetchsize
option which is used to specify how many rows to fetch at a time, by default it is set to 10.
// Using numPartitions
val df = spark.read
.format("jdbc")
.option("query", "select id,age from employee where gender='M'")
.option("numPartitions",5)
.option("fetchsize", 20)
.......
.......
.load()
5. Complete Example
Following is the complete example of how to read a table using jdbc() method.
import org.apache.spark.sql.SparkSession
import java.util.Properties
object ReadJDBC extends App{
// Create SparkSession in spark 2.x or later
val spark = SparkSession.builder().master("local[*]")
.appName("SparkByExamples.com")
.getOrCreate()
// Connection Properties
val connProp = new Properties()
connProp.setProperty("driver", "com.mysql.cj.jdbc.Driver");
connProp.put("user", "root")
connProp.put("password", "root")
// Read from MySQL Table
val df = spark.read
.jdbc("jdbc:mysql://localhost:3306/emp", "employee", connProp)
// Show DataFrame
df.show()
// Using format()
val df2 = spark.read
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://localhost:3306/emp")
.option("dbtable", "employee")
.option("user", "root")
.option("password", "root")
.load()
df2.show()
}
If you get the error java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver when you run this code, you probably don’t have MySQL jar in classpath. Follow the below steps in IntelliJ to resolve this.
- Copy the MySQL jar somewhere on your system. (you can get it from .m2 repository)
- Go to File > Project Structure.
- Select Modules at the left panel and select the Dependencies tab.
- Select the + icon and select 1 JARs or Directories option.
- select your JAR file or you can select the directories.
- Click on the OK button.
Conclusion
In this article, you have learned how to read a JDBC table using Spark jdbc() method. Also, learned how to connect to a MySQL and read the table to DataFrame.
Related Articles
- How to Connect Spark to Remote Hive
- Spark with SQL Server – Read and Write Table
- Spark Read Hive Table to Spark DataFrame
- Spark Save DataFrame to Hive Table
- Spark spark.table() vs spark.read.table()
- Spark SQL Create a Table
- Spark Read JDBC in Parallel
- Spark Types of Tables and Views
- Spark Drop, Delete, Truncate Differences
- Time Travel with Delta Tables in Databricks?
- Spark createOrReplaceTempView() Explained
- Difference in DENSE_RANK and ROW_NUMBER in Spark