Apache Spark default comes with the spark-shell
command that is used to interact with Spark from the command line. This is usually used to quickly analyze data or test spark commands from the command line. PySpark shell is referred to as REPL (Read Eval Print Loop). Apache Spark supports spark-shell
for Scala, pyspark for Python, and sparkr
for R language. Java is not supported at this time.
Spark Shell Key Points –
- Spark shell is referred as REPL (Read Eval Print Loop) which is used to quickly test Spark/PySpark statements.
- The Spark Shell supports only Scala, Python and R (Java might be supported in previous versions).
- The
spark-shell2
command is used to launch Spark with Scala shell. I have covered this in detail in this article. - The pyspark command is used to launch Spark with Python shell also call PySpark.
- The
sparkr
command is used to launch Spark with R language. - In Spark shell, Spark by default provides
spark
andsc
variables.spark
is an object of SparkSession andsc
is an object of SparkContext. - In Shell you cannot create your own SparkContext
Pre-requisites: Before you proceed make sure you have Apache Spark installed.
1. Launch Spark Shell (spark-shell) Command
Go to the Apache Spark Installation directory from the command line and type bin/spark-shell
and press enter, this launches Spark shell and gives you a scala prompt to interact with Spark in scala language. If you have set the Spark in a PATH then just enter spark-shell in command line or terminal (mac users).
./bin/spark-shell
Yields below output.

Let’s understand a few statements from the above screenshot.
- By default, spark-shell creates a Spark context which internally creates a Web UI with URL http://localhost:4040. Since it is unable to bind on 4040 for me it was created on 4042 port.
- Spark context created with app id local-*
- By default it uses local[*] as master
- Spark context and session are created with variables
'sc'
and'spark'
respectively. - Shows Spark, Scala and Java versions used.
2. Spark Shell Web UI
By default Spark Web UI launches on port 4040, if it could not bind then it tries on 4041, 4042, and son until it binds.
3. Run Spark Statements from Shell
Let’s create a Spark DataFrame with some sample data to validate the installation. Enter the following commands in the Spark Shell in the same order.
import spark.implicits._
val data = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000"))
val df = data.toDF()
df.show()
Yields below output. For more examples on Apache Spark refer to Spark Tutorial with Examples.

4. Spark Shell Examples
Let’s see the different spark-shell command options
Example 1: Launch in Cluster mode
./bin/spark-shell \
--master yarn \
--deploy-mode cluster
This launches the Spark driver program in cluster
. By default, it uses client
mode which launches the driver on the same machine where you are running shell.
Example 2: In case you wanted to add dependency jars
./bin/spark-shell \
--master yarn \
--deploy-mode cluster \
--jars file1.jar,file2.jar
Example 3: Adding jars to spark-shell
If you wanted to add a jar to spark-shell use –driver-class-path option.
spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar
Example 4: With Configs
./bin/spark-shell \
--master yarn \
--deploy-mode cluster \
--driver-memory 8g \
--executor-memory 16g \
--executor-cores 2 \
--conf "spark.sql.shuffle.partitions=20000" \
--conf "spark.executor.memoryOverhead=5244" \
--conf "spark.memory.fraction=0.8" \
--conf "spark.memory.storageFraction=0.2" \
--jars file1.jar,file2.jar
5. Commands in Spark Shell
While you interacting in shell, you probably require some help for example what all the different imports are available, all history commands e.t.c. You can get all available options by using :help
scala> :help
All commands can be abbreviated, e.g., :he
instead of :help
.
:completions <string>
Output completions for the given string
:edit <id>|<line>
Edit history
:help [command]
Print this summary or command-specific help
:history [num]
Show the history (optional num of commands to show)
:h? <string>
Search the history
:imports [name name ...]
Show import history, identifying sources of names
:implicits [-v]
Show the implicits in scope
:javap <path|class>
Disassemble a file or class name
:line <id>|<line>
Place line(s) at the end of history
:load <path>
Interpret lines in a file
:paste [-raw] [path]
Enter paste mode or paste a file
:power
Enable power user mode
:quit
Exit the interpreter
:replay [options]
Reset the repl and replay all previous commands
:require <path>
Add a jar to the classpath
:reset [options]
Reset the repl to its initial state, forgetting all session entries
:save <path>
Save replayable session to a file
:sh <command line>
Run a shell command (result is implicitly => List[String])
:settings <options>
Update compiler options, if possible; see reset
:silent
Disable/enable automatic printing of results
:type [-v] <expr>
Display the type of an expression without evaluating it
:kind [-v] <type>
Display the kind of a type. see also :help kind
:warnings
Show the suppressed warnings from the most recent line which had any
6. Accessing Environment Vaiables
Sometimes you would be required to access environment variables in shell, you can achieve this by accessing System.getenv()
method. Note that this is a Java method but you can use it.
For example on UNIX shell set a variable.
export ENV_NAME='SparkByExamples.com'
Now open spark-shell and access it from the scala prompt.
scala>System.getenv('ENV_NAME')
7. Run Unix Shell Script File
In case you wanted to run a Unix shell file (.sh file) from the scala prompt, you can do this by using :sh <file-name>
. I have nnk.sh
file with content echo 'SparkByExamples.com' > nnk.out
scala> :sh /Users/admin/nnk.sh
res0: scala.tools.nsc.interpreter.ProcessResult = `/Users/admin/nnk.sh` (0 lines, exit 0)
This executes nnk.sh
file which creates nnk.out
file with content 'SparkByExamples.com'
8. Load Scala Script
By using :load
from a shell, you can load the Scala file. First, create a scala file, I will be creating nnk.scala
with content println("SparkByExamples.com")
Now let’s launch shell and load this scala program. This comes in handy if you have commands in a scala file and wanted to run from a shell.
scala> :load nnk.scala
Loading nnk.scala...
SparkByExamples.com
scala>
9. Spark Shell Options
Like any other shell command, Apache Spark shell also provides several options, you can get all available options with -h (help). Below are some of the important options.
Spark Shell Options | Option Description |
---|---|
-I <file> | preload <file>, enforcing line-by-line interpretation |
–master MASTER_URL | spark://host:port, mesos://host:port, yarn, k8s://https://host:port, or local (Default: local[*]). |
–deploy-mode DEPLOY_MODE | Whether to launch the driver program locally (“client”) or on one of the worker machines inside the cluster (“cluster”) (Default: client). |
–class CLASS_NAME | Main class you wanted to run. This is applicable to Java / Scala apps. |
–py-files PY_FILES | Comma-separated list of .zip, .egg, or .py files to place. This is applicable only to Python. |
–name NAME | Specify the name of your application. |
–jars JARS | Comma-separated list of jars to include on the driver and executor classpaths. |
–packages | Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. |
–files FILES | Comma-separated list of files to be placed in the working directory of each executor. |
For the complete list of spark-shell options use the -h command.
.bin/spark-shell -h
This yields the below output. If you closely look at it most of the options are similar to spark-submit command.

Conclusion
In this article, you have learned What is Spark shell, how to use it with examples, and the different options available inside a shell.