Site icon Spark By {Examples}

Spark Deploy Modes – Client vs Cluster Explained

spark client vs cluster deploy

The difference between Client vs Cluster deploy modes in Spark/PySpark is the most asked Spark interview question – Spark deployment mode (--deploy-mode) specifies where to run the driver program of your Spark application/job, Spark provides two deployment modes, client and cluster, you could use these to run Java, Scala, and PySpark applications.

Using spark-submit --deploy-mode <client/cluster>, you can specify where to run the Spark application driver program.

1. Spark/PySpark Deploy Modes

ValueDescription
clusterIn cluster mode, the driver runs on one of the worker nodes, and this node shows as a driver on the Spark Web UI of your application. cluster mode is used to run production jobs.
clientIn client mode, the driver runs locally from where you are submitting your application using spark-submit command. client mode is majorly used for interactive and debugging purposes. Note that in client mode only the driver runs locally and all tasks run on cluster worker nodes.

If you wanted to know the deploy mode of running or completed Spark application, you can get it by accessing Spark Web UI from Spark History Server UI and check for spark.submit.deployMode property on Environment tab

2. Client Deploy Mode in Spark

In client mode, the Spark driver component of the spark application will run on the machine from where the job submitted.

In a typical Cloudera cluster, you submit the Spark application from the Edge node hence the Spark driver will run on an edge node.

In a Spark Standalone Cluster, the driver runs on a master node (dedicated server) with dedicated resources.


spark-submit --deploy-mode client --driver-memory xxxx  ......

Note: Network Overhead – As data needs to be moved between the driver and the worker nodes across the network (between the submitting machine(driver machine) and the cluster), depending on the network latency you may notice performance degradation.

3. Cluster Deploy Mode in Spark:

In Cluster Deploy mode, the driver program would be launched on any one of the spark cluster nodes (on any of the available nodes in the cluster). Cluster deployment is mostly used for large data sets where the job takes few mins/hrs to complete.


spark-submit --deploy-mode cluster --driver-memory xxxx  ........

Hope you like the above explanation of Spark/PySpark Cluster and Client Deploy mode differences !!

Conclusion

In this article, you have learned the difference between Spark/PySpark Client vs Cluster mode, In Client mode, Spark runs driver in local machine, and in cluster mode, it runs driver on one of the nodes in the cluster.

Happy Learning !!

Reference

Exit mobile version