While running a Spark application with Hive enabled getting the below error message?
Exception in thread “main” java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
at com.training.hivetest.App.main(App.java:21)
You would be getting this issue since you don’t have the appropriate Spark Hive library in your dependency. Let’s add the following dependencies to your pom.xml file and run a sample example that saves the spark DataFrame to Hive table
1. Unable to instantiate SparkSession with Hive support because Hive classes are not found
To enable Hive support you would need the following dependencies.
// Unable to instantiate SparkSession with Hive support because Hive classes are not found
<dependency>
<groupid>org.apache.spark</groupid>
<artifactid>spark-core_2.13</artifactid>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupid>org.apache.spark</groupid>
<artifactid>spark-sql_2.13</artifactid>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupid>org.apache.spark</groupid>
<artifactid>spark-hive_2.13</artifactid>
<version>3.2.1</version>
</dependency>
3. Create SparkSession with Hive Enabled Support
With Spark Hive support enabled, by default, Spark writes the data to the default Hive warehouse location which is /user/hive/warehouse
when you use a Hive cluster. But on local it creates in the current directory. You can change this behavior, using the spark.sql.warehouse.dir
configuration while creating a SparkSession
.
Since we are running it locally from IntelliJ, it creates a metadata database metastore_db
and spark-warehouse
under the current directory.
Let’s create a DataFrame and then use it to create a table from Spark.
// Create SparkSession with Hive Enabled Support
import org.apache.spark.sql.{SaveMode, SparkSession}
object SaveHive extends App {
val spark = SparkSession.builder()
.master("local[*]")
.appName("SparkCreateTableExample")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
// Create DataFrame
val sampleDF = Seq((1, "James",30,"M"),
(2, "Ann",40,"F"), (3, "Jeff",41,"M"),
(4, "Jennifer",20,"F")
).toDF("id", "name","age","gender")
// Create Hive Internal table
sampleDF.write.mode(SaveMode.Overwrite)
.saveAsTable("employee")
}
As described above, it creates the Hive metastore metastore_db
and Hive warehouse location spark-warehouse
in the current directory (you can see this in IntelliJ). The employee
table is created inside the warehouse directory.
Also, note that by default it creates files in parquet format with snappy compression.
I hope this solves your problem Exception in thread “main” java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)
You can find the complete working example at GitHub Spark Hive Example
Related Articles
- Spark Types of Tables and Views
- Spark createOrReplaceTempView() Explained
- Time Travel with Delta Tables in Databricks?
- Spark Internal Execution plan
- Broadcast Join in Spark