Sparkling Water – java.lang.NoClassDefFoundError: org/apache/spark/repl/Main

While Running H2O Sparkling Water (Machine Learning Models) in the Spark cluster, you would probably get exception java.lang.NoClassDefFoundError: org/apache/spark/repl/Main and program fails.

I had this issue when I was running Sparkling Water with below configuration

  • Sparkling Water – sparkling-water-3.28.0.3-1-2.4
  • Spark – spark-2.4.4-bin-hadoop2.7 (with winutils)
  • Scala – 2.11.11
  • OS – Windows 10

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/Main$
	at org.apache.spark.repl.h2o.H2OIMain$._classOutputDirectory$lzycompute(H2OIMain.scala:51)
	at org.apache.spark.repl.h2o.H2OIMain$._classOutputDirectory(H2OIMain.scala:50)
	at org.apache.spark.repl.h2o.H2OIMain$.classOutputDirectory(H2OIMain.scala:60)
	at org.apache.spark.repl.h2o.H2OInterpreter$.classOutputDirectory(H2OInterpreter.scala:84)
	at org.apache.spark.repl.h2o.H2OInterpreter.createSettings(H2OInterpreter.scala:66)
	at org.apache.spark.repl.h2o.BaseH2OInterpreter.initializeInterpreter(BaseH2OInterpreter.scala:100)
	at org.apache.spark.repl.h2o.BaseH2OInterpreter.(BaseH2OInterpreter.scala:290)
	at org.apache.spark.repl.h2o.H2OInterpreter.(H2OInterpreter.scala:38)
	at water.api.scalaInt.ScalaCodeHandler.createInterpreterInPool(ScalaCodeHandler.scala:145)
	at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:139)
	at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:138)
	at scala.collection.immutable.Range.foreach(Range.scala:160)
	at water.api.scalaInt.ScalaCodeHandler.initializeInterpreterPool(ScalaCodeHandler.scala:138)
	at water.api.scalaInt.ScalaCodeHandler.(ScalaCodeHandler.scala:42)
	at water.api.scalaInt.ScalaCodeHandler$.registerEndpoints(ScalaCodeHandler.scala:171)
	at water.api.CoreRestAPI$.registerEndpoints(CoreRestAPI.scala:32)
	at water.api.RestAPIManager.register(RestAPIManager.scala:39)
	at water.api.RestAPIManager.registerAll(RestAPIManager.scala:31)
	at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:43)
	at org.apache.spark.h2o.H2OContext$H2OContextClientBased.initBackend(H2OContext.scala:450)
	at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:150)
	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:608)
	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:636)
	at com.sparkbyexamples.spark.SparklingWaterExample$.delayedEndpoint$com$sparkbyexamples$spark$SparklingWaterExample$1(SparklingWaterExample.scala:13)
	at com.sparkbyexamples.spark.SparklingWaterExample$delayedInit$body.apply(SparklingWaterExample.scala:6)
	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.App$class.main(App.scala:76)
	at com.sparkbyexamples.spark.SparklingWaterExample$.main(SparklingWaterExample.scala:6)
	at com.sparkbyexamples.spark.SparklingWaterExample.main(SparklingWaterExample.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.Main$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 34 more

Solution

In my case, after adding Spark REPL maven dependency, my issues have been resolved and have not seen this exception anymore.

<dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-repl_2.11</artifactId>
       <version>2.4.4</version>
 </dependency>

In case, if your issue has not resolved, please comment with Spark, Sparkling Water and Scala version you are using, I will happy to help.

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply