You are currently viewing Solved: Unable to instantiate SparkSession with Hive support because Hive classes are not found

While running a Spark application with Hive enabled getting the below error message?

Advertisements

Exception in thread “main” java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)


Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
    at com.training.hivetest.App.main(App.java:21)

You would be getting this issue since you don’t have the appropriate Spark Hive library in your dependency. Let’s add the following dependencies to your pom.xml file and run a sample example that saves the spark DataFrame to Hive table

1. Unable to instantiate SparkSession with Hive support because Hive classes are not found

To enable Hive support you would need the following dependencies.


// Unable to instantiate SparkSession with Hive support because Hive classes are not found
<dependency>
   <groupid>org.apache.spark</groupid>
   <artifactid>spark-core_2.13</artifactid>
   <version>3.2.1</version>
   <scope>compile</scope>
</dependency>

<dependency>
   <groupid>org.apache.spark</groupid>
   <artifactid>spark-sql_2.13</artifactid>
   <version>3.2.1</version>
   <scope>compile</scope>
</dependency>

<dependency>
   <groupid>org.apache.spark</groupid>
   <artifactid>spark-hive_2.13</artifactid>
   <version>3.2.1</version>
</dependency>

3. Create SparkSession with Hive Enabled Support

With Spark Hive support enabled, by default, Spark writes the data to the default Hive warehouse location which is /user/hive/warehouse when you use a Hive cluster. But on local it creates in the current directory. You can change this behavior, using the spark.sql.warehouse.dir configuration while creating a SparkSession .

Since we are running it locally from IntelliJ, it creates a metadata database metastore_db and spark-warehouse under the current directory.

Let’s create a DataFrame and then use it to create a table from Spark.


// Create SparkSession with Hive Enabled Support
import org.apache.spark.sql.{SaveMode, SparkSession}

object SaveHive extends App {
  val spark = SparkSession.builder()
    .master("local[*]")
    .appName("SparkCreateTableExample")
    .enableHiveSupport()
    .getOrCreate()

  import spark.implicits._

  // Create DataFrame
  val sampleDF = Seq((1, "James",30,"M"),
    (2, "Ann",40,"F"), (3, "Jeff",41,"M"),
    (4, "Jennifer",20,"F")
    ).toDF("id", "name","age","gender")

  // Create Hive Internal table
  sampleDF.write.mode(SaveMode.Overwrite)
    .saveAsTable("employee")
}

As described above, it creates the Hive metastore metastore_db and Hive warehouse location spark-warehouse in the current directory (you can see this in IntelliJ). The employee table is created inside the warehouse directory.

Also, note that by default it creates files in parquet format with snappy compression.

Hive support because Hive classes are not found

I hope this solves your problem Exception in thread “main” java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)

You can find the complete working example at GitHub Spark Hive Example

Related Articles