You are currently viewing Add Multiple Jars to Spark Submit Classpath?

When submitting Spark or PySpark applications using spark-submit, we often need to include multiple third-party jars in the classpath, Spark supports multiple ways to add dependency jars to the classpath.

For example, to add single or multiple jars to the classpath of the spark application, you can use the –jars option of the spark-submit command, SparkConf, and spark-defaults.conf file.

1. Create uber or assembly jar

Create an assembly or uber jar by including your application classes and all third-party dependencies. You can do this either using the Maven shade plugin or equivalent SBT assembly, for PySpark create a zip file or egg file.

By doing this, you don’t have to worry about adding jars to the classpath as all dependencies are already part of your Uber jar.

2. Add jars to a Spark Submit classpath

Adding multiple third-party jars to classpath can be done using spark-submit, spark-defaults.conf, and SparkConf properties, before using these options you need to understand the priority of how these apply. Below is the precedence of how they apply in order.

  1. Properties set directly on the SparkConf take the highest precedence.
  2. The second precedence goes to spark-submit options.
  3. Finally, properties specified in spark-defaults.conf file.

When you are setting jars in different places, remember the precedence it takes. Use spark-submit with --verbose option to get more details about what jars Spark has used.

2.1 Add jars to the classpath using –jar Option

You can add jars using Spark submit option --jar, using this option you can add a single jar or multiple jars by comma-separated.


// Adding jars to the classpath
spark-submit --master yarn
             --class com.sparkbyexamples.WordCountExample
             --jars /path/first.jar,/path/second.jar,/path/third.jar
             your-application.jar 

Alternatively, you can also use SparkContext.addJar()

2.2 Adding all jars from a folder to classpath

You can use the below snippet to add all jars from a folder automatically. Here, $(echo /path/*.jar | tr ' ' ',') statement creates a comma-separated string by appending all jar names in a folder.


// Adding all jars from a folder to classpath
spark-submit -- class com.sparkbyexamples.WordCountExample \ 
             --jars $(echo /path/*.jar | tr ' ' ',') \ 
             your-application.jar 

This option is handy when you have many jars to append to the classpath. Imagine using tens of jars in a comma-separated manually and when you have to update the version of the jars, it’s going to be a nightmare to maintain. Hence, place all required jars in a folder/directory and add them to Spark classpath using the above approach.

2.3 Adding jars with spark-defaults.conf

You can also specify jars on $SPARK_HOME/conf/spark-defaults.conf, but this is not a preferable option and any libraries you specify here take low precedence.


// Add jars to driver classpath
spark.driver.extraClassPath /path/first.jar:/path/second.jar
// Add jars to executor classpath
spark.executor.extraClassPath /path/first.jar:/path/second.jar

Note that on Windows, the jar file names should be separated with comma (,) instead of colon (:)

2.4 Using SparkConf properties

You can also programmatically add multiple jars to the Spark classpath while creating the SparkSession. This takes high priority among other configs.


// Using SparkConf properties
spark = SparkSession \
        .builder \
        .appName("SparkByExamples.com") \
        .config("spark.yarn.dist.jars", "/path/first.jar,/path/second.jar") \
        .getOrCreate()

3. Add jars to Spark Driver Classpath

Sometimes you may need to add a jar to only the Spark driver, you can do this by using --driver-class-path or --conf spark.driver.extraClassPath


// Adding jars to Spark Driver
spark-submit -- class com.sparkbyexamples.WordCountExample \ 
             --jars $(echo /path/jars/*.jar | tr ' ' ',') \ 
             --driver-class-path jar-driver.jar
             your-application.jar

5. Add jars to spark-shell

Options on spark-shell are similar to spark-submit hence you can use the options specified above to add one or multiple jars to spark-shell classpath.


// Add jars to spark-shell 
spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar

6. Other options

You can also set jars to Spark driver classpath using the below options.


--conf spark.driver.extraLibraryPath=/path/ 
// Or use below, both do the same
--driver-library-path /path/

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has One Comment

  1. Mich Talebzadeh

    In Yarn mode, it is important that Spark jar files are available throughout the Spark cluster. I have spent a fair bit of time on this and I recommend that you follow this procedure to make sure that the spark-submit job runs ok. Use the spark.yarn.archive configuration option and set that to the location of an archive (you create on HDFS) containing all the JARs in the $SPARK_HOME/jars/ folder, at the root level of the archive. For example:

    1) Create the archive:
    jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
    2) Create a directory on HDFS for the jars accessible to the application
    hdfs dfs -mkdir /jars
    3) Upload to HDFS:
    hdfs dfs -put spark-libs.jar /jars
    4) For a large cluster, increase the replication count of the Spark archive
    so that you reduce the amount of times a NodeManager will do a remote copy
    hdfs dfs -setrep -w 10 hdfs:///jars/spark-libs.jar (Change the amount of
    replicas proportional to the number of total NodeManagers)
    3) In $SPARK_HOME/conf/spark-defaults.conf file set
    spark.yarn.archive to hdfs:///rhes75:9000/jars/spark-libs.jar. Similar to
    below
    spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jar