In my last article, I’ve explained submitting a job using spark-submit command, alternatively, we can use spark standalone master REST API (RESTFul) to submit a Scala or Python(PySpark) job or application.
In this article, I will explain how to submit Scala and PySpark (python) jobs. using Rest API, getting the status of the application, and finally killing the application with an example.
1. Spark Standalone mode REST API
Spark standalone mode provides REST API to run a spark job, below I will explain using some of the REST API’s from CURL command but in real time you can integrate this with your web UI application or any RESTFul API.
1.1 Enable REST API
By default REST API service is disabled, you can enable it by adding the below configuration on spark-defaults.conf
file.
// Enable REST API
spark.master.rest.enabled true
After you add the property, make sure you restart the service to effect this change.
./sbin/start-master.sh
./sbin/start-slave.sh spark://192.168.1.1:7077
And make sure the standalone cluster is up and running by accessing the below URL. Replace the IP address and port according to your setup.
http://192.168.1.1:8080
Not enabling this property, you will get the following error when you attempt to submit the application.
This Page Cannot Be Displayed
The system cannot communicate with the external server (spark-master-ip).
The Internet server may be busy, maybe permanently down, or maybe unreachable because of network problems.
Please check the spelling of the Internet address entered.
If it is correct, try this request later.
1.2 Spark Submit REST API Request
We use REST API /v1/submissions/create
to submit an application to the standalone cluster, with this request you need to provide the class you wanted to run for mainClass
, appArgs
for any command-line arguments and location of the jar file with appResource
to name few.
As said in the beginning, here I’ve explained using REST API from curl command.
curl -X POST http://192.168.1.1:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "/home/hduser/sparkbatchapp.jar",
"sparkProperties": {
"spark.executor.memory": "8g",
"spark.master": "spark://192.168.1.1:7077",
"spark.driver.memory": "8g",
"spark.driver.cores": "2",
"spark.eventLog.enabled": "false",
"spark.app.name": "Spark REST API - PI",
"spark.submit.deployMode": "cluster",
"spark.jars": "/home/user/spark-examples_versionxx.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "2.4.0",
"mainClass": "org.apache.spark.examples.SparkPi",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [
"80"
]
}'
This will submit the job to the cluster and returns the following response which contains the application id @ submissionId
field.
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20200923223841-0001",
"serverSparkVersion" : "2.4.0",
"submissionId" : "driver-20200923223841-0001",
"success" : true
}
1.2 Submitting PySpark using REST API
The below example submits the PySpark example spark_pi.py
located at /home/user/
with command line argument 80
.
curl -X POST http://192.168.1.1:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "file:/home/user/spark_pi.py",
"sparkProperties": {
"spark.executor.memory": "8g",
"spark.master": "spark://192.168.1.1:7077",
"spark.driver.memory": "8g",
"spark.driver.cores": "2",
"spark.eventLog.enabled": "false",
"spark.app.name": "Spark REST API - PI",
"spark.submit.deployMode": "cluster",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "2.4.0",
"mainClass": "org.apache.spark.deploy.SparkSubmit",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [ "/home/user/spark_pi.py", "80" ]
}'
1.3 Status of the Job from REST API
You can use either Spark UI to monitor your job or you can submit the following Rest API request to get the Status of the application. Make sure you specify the driver-applicatonid you got from the previous request.
curl http://192.168.1.1:6066/v1/submissions/status/driver-20200923223841-0001
This results in below response.
{
"action" : "SubmissionStatusResponse",
"driverState" : "FINISHED",
"serverSparkVersion" : "2.4.0",
"submissionId" : "driver-20200923223841-0001",
"success" : true,
"workerHostPort" : "192.168.1.1:38451",
"workerId" : "worker-20200923223841-192.168.1.2-34469"
}
1.4 Kill the Job
Sometimes we may need to kill the job, below is the REST API to kill the job.
// Kill the Job
curl -X POST http://192.168.1.1:6066/v1/submissions/kill/driver-20200923223841-0001
This results in below response
{
"action" : "KillSubmissionResponse",
"message" : "Kill request for driver-20200923223841-0001 submitted",
"serverSparkVersion" : "2.4.0",
"submissionId" : "driver-20200923223841-0001",
"success" : true
}
2. Using the REST API for Yarn Manager
Submitting an application to Yarn using Rest API is a little tricky and I will cover this in the future when I was able to submit successfully, meanwhile please refer to the below links.
In case if you are not succeed try to use Cloudera Livy. From teh Livy document it supports the following.
- Interactive Scala, Python, and R shells
- Batch submissions in Scala, Java, Python
- Multiple users can share the same server (impersonation support)
- Can be used for submitting jobs from anywhere with REST
- Does not require any code change to your programs
Conclusion
In this article, you have learned how to submit a spark application using Standalone mode REST API , getting the status of the application and killing it and finally got some pointers on how to use Yarn Rest API and Livy.
Happy Learning !!
Related Article
- Spark Submit Command Explained with Examples
- Add Multiple Jars to Spark Submit Classpath?
- Spark – Different Types of Issues While Running in Cluster?
- What does setMaster(local[*]) mean in Spark
- Spark Shell Command Usage with Examples
- Spark Get the Current SparkContext Settings
- Spark – Using XStream API to write complex XML structures
- Spark Read XML file using Databricks API
- Spark SQL Sampling with Examples
‘m using pyspark stand alone setup to run jobs like this .\submit-job.cmd E:\Test\Test.py,
is it possible to submit job with the help for REST API as mentioned in the tutorial, as i coildnt find the web api service url, but my master and worker runs in this respectively Spark Master at spark://192.168.0.147:7077 and Spark Worker at 192.168.0.147:56594
Im unable to find the Web API
IN order to submit Spark jobs using API you need to setup a thrir-party service that describes in this article.
Im using windows machine and i have created standalone setup , third party setups mentioned like livy and file server are they application to windows platform, as most of the tutorials are related to linux
I have not tried it but I believe you should able to use Livy in Windows. I will write an article on this soon.