You are currently viewing Sparkling Water – H2OConf

org.apache.spark.h2o.H2OConf (H2O Configuration) is a wrapper on SparkConf and inherits all properties from SparkConf and provides additional properties related to H2O Sparkling Water cluster.

Creating H2OConf

All H2O cluster configuration should be provided using org.apache.spark.h2o.H2OConf class or at the command line. Below syntaxes are used to create an H2OConf object in Sparkling Water.

Syntax


H2OConf(sparkSession: SparkSession)
H2OConf(sc: SparkContext)
H2OConf(sparkConf: SparkConf)

Above constructors, syntaxes are self-explanatory as in order to create an H2OConf object you need to create either SparkSession, SparkContext or SparkConf object and pass these as an argument to H2Oconf constructors


  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate();
  val h2oConf = new H2OConf(spark)
  val h2oContext = H2OContext.getOrCreate(spark, conf)

Sparkling Water H2OConf Methods

setCloudName()

Using setCloudName() we can give the name to the H2O cluster


val h2oConf = new H2OConf(spark).setCloudName("CloudName1")
val h2oContext = H2OContext.getOrCreate(spark, h2oConf )

setInternalClusterMode()


val h2oConf = new H2OConf(spark).setInternalClusterMode()
val h2oContext = H2OContext.getOrCreate(spark, h2oConf )

setExternalClusterMode()


val h2oConf = new H2OConf(spark).setExternalClusterMode()
val h2oContext = H2OContext.getOrCreate(spark, h2oConf)

useAutoClusterStart()

useAutoClusterStart() method is used to specify and run H2O Cluster in automatic Mode of External Backend.

useManualClusterStart()

useManualClusterStart() method is used to specify and run H2O Cluster in automatic Mode of External Backend.

setH2ODriverPath()

When you run an H2O cluster in the backend, use the setH2ODriverPath() (Path of the H2O driver) to provide the driver jar.

setClusterSize()

Using setClusterSize() we can set the size of the H2O Cluster.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium