While running spark jobs, you may come across java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 error with below stack trace. This error occurs when you try to create multiple spark contexts.
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1354)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Ideally we should not create multiple spark context’s. Some times unknowingly, our code might cause to create multiple spark context and this would be very hard to trouble shoot and fix. some cases the code might work in standalone and fails on cluster. So carefully refactor the code not to create multiple spark context’s
In my case I’ve created spark context at instance level on driver program and try to use the context on dataframe map transformation and this cause broadcast error. In order to resolve this, I had created SparkContext in a main method and have passed it to a method where its required in map transformation
In another case, when I tried to crate SparkContext and Streamingcontext from scratch I was getting this error. Below is the code how to create StreamingContext from existing Sparkcontext.
val spark = val spark: SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
Hope this helps !!