You are currently viewing java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0

While running spark jobs, you may come across java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 error with below stack trace. This error occurs when you try to create multiple spark contexts.

Advertisements

java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1354)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Ideally we should not create multiple spark context’s. Some times unknowingly, our code might cause to create multiple spark context and this would be very hard to trouble shoot and fix. some cases the code might work in standalone and fails on cluster. So carefully refactor the code not to create multiple spark context’s

In my case I’ve created spark context at instance level on driver program and try to use the context on dataframe map transformation and this cause broadcast error. In order to resolve this, I had created SparkContext in a main method and have passed it to a method where its required in map transformation

In another case, when I tried to crate SparkContext and Streamingcontext from scratch I was getting this error. Below is the code how to create StreamingContext from existing Sparkcontext.


val spark = val spark: SparkSession = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExamples.com")
      .getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1)) 

Hope this helps !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium