When submitting Spark or PySpark application using spark-submit, we often need to include multiple third-party jars in classpath, Spark supports multiple ways to add dependency jars to the classpath.
1. Creating uber or assembly jar
Create an assembly or uber jar by including your application classes and all third-party dependencies. You can do this either using the Maven shade plugin or equivalent SBT assembly, for PySpark create a zip file or egg file.
By doing this, you don’t have to worry about adding jars to the classpath as all dependencies are already part of your uber jar.
2. Adding individual jars to a classpath
Adding multiple third-party jars to classpath can be done using spark-submit,
spark-defaults.conf, and SparkConf properties, before using these options you need to understand the priority of how these apply. Below is the precedence of how they apply in order.
- Properties set directly on the
SparkConftake the highest precedence.
- The second precedence goes to
- Finally, properties specified in
When you are setting jars in different places, remember the precedence it takes. Use spark-submit with
--verbose option to get more details about what jars spark has used.
2.1 Adding jars to the classpath
You can also add jars using Spark submit option
--jar, using this option you can add a single jar or multiple jars by comma-separated.
spark-submit --master yarn --class com.sparkbyexamples.WordCountExample --jars /path/first.jar,/path/second.jar,/path/third.jar your-application.jar
Alternatively, you can also use
2.2 Adding all jars from a folder to classpath
If you have many jars, imagine using all these jars in a comma-separated and when you have to update the version of the jars, it’s going to be a nightmare to maintain this.
You can use the below snippet to add all jars from a folder automatically,
$(echo /path/*.jar | tr ' ' ',') statement creates a comma-separated string by appending all jar names in a folder.
spark-submit -- class com.sparkbyexamples.WordCountExample \ --jars $(echo /path/*.jar | tr ' ' ',') \ your-application.jar
2.3 Adding jars with spark-defaults.conf
You can also specify jars on
$SPARK_HOME/conf/spark-defaults.conf, but this is not a preferable option and any libraries you specify here take low precedence.
#Add jars to driver classpath spark.driver.extraClassPath /path/first.jar:/path/second.jar #Add jars to executor classpath spark.executor.extraClassPath /path/first.jar:/path/second.jar
On windows, the jar file names should be separated with comma (,) instead of colon (:)
2.4 Using SparkConf properties
This takes the high priority among other configs.
spark = SparkSession \ .builder \ .appName("SparkByExamples.com") \ .config("spark.yarn.dist.jars", "/path/first.jar,/path/second.jar") \ .getOrCreate()
3. Adding jars to Spark Driver
Sometimes you may need to add a jar to only Spark driver, you can do this by using
spark-submit -- class com.sparkbyexamples.WordCountExample \ --jars $(echo /path/jars/*.jar | tr ' ' ',') \ --driver-class-path jar-driver.jar your-application.jar
5. Adding jars to spark-shell
Options on spark-shell are similar to spark-submit hence you can use the options specified above to add one or multiple jars to spark-shell classpath.
spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar
6. Other options
--conf spark.driver.extraLibraryPath=/path/ # or use below, both do the same --driver-library-path /path/
Happy Learning !!