You are currently viewing Sparkling Water – H2OContext

H2OContext is an entry point to the Sparkling Water and this is used to connect to external H2O cluster or to create a standalone and local cluster. H2O Context class provides methods to transform or convert RDD/DataFrame to H2OFrame and H2OFrame to RDD/DataFrame. and also implicits conversions.

Advertisements

Creating H2OContext object

H2OContext is created using the builder method getOrCreate(), this takes SparkSession object as a parameter and optionally H2OConf object.


val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();

val h2oContext = H2OContext.getOrCreate(spark)

Using H2OConf with H2O context

H2OConf (H2O Configuration) is a wrapper on SparkConf and inherits all properties from SparkConf and provides additional properties related to the H2O Sparkling Water cluster.


val h2oConf = new H2OConf(spark).setCloudName("CloudName1")
val h2oContext = H2OContext.getOrCreate(spark, h2oConf )

Above example provides a cloud name “CloudName1” to the H2O cluster, For more details please read Usage of H2O Configuration

asH2OFrame() – Convert DataFame to H2OFrame

Use asH2OFrame() method of H2OContext to convert or transform Spark DataFrame to Sparkling water H2OFrame.

Syntax


asH2OFrame(df: DataFrame): H2OFrame
asH2OFrame(df: DataFrame, frameName: String)

Example


val zipCodes = "src/main/resources/small_zipcode.csv"
val zipCodesDF = spark.read.option("header", "true")
    .option("inferSchema", "true")
    .csv(zipCodes)

val h2oContext = H2OContext.getOrCreate(spark)
val h2oFrame = h2oContext.asH2OFrame(zipCodesDF)

asDataFrame() – Convert H2OFrame to DataFrame

Use asDataFrame() method of H2OContext to convert or transform Sparkling water H2OFrame to Spark DataFrame.

Syntax


 def asDataFrame[T <: Frame](fr: T, copyMetadata: Boolean = true): DataFrame

Example


  val h2oContext = H2OContext.getOrCreate(spark)

  //Creating H2OFrame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

  //Convert H2OFrame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)

asH2OFrame() – Convert RDD to H2OFrame

asH2OFrame() is an overloaded method, this takes Spark RDD as a parameter and returns H2OFrame.


asH2OFrame(rdd: SupportedRDD): H2OFrame
asH2OFrame(rdd: SupportedRDD, frameName: String)

asRDD() – H2OFrame to RDD

asRDD() transforms H2O Frame to Spark RDD


asRDD[A <: Product : TypeTag : ClassTag](fr: H2OFrame)

asH2OFrameKeyString()

asH2OFrameKeyString() method takes RDD/DataFrame as a parameter and transforms to H2OFrame and returns a string representation of H2O Frame Key.


//RDD
asH2OFrameKeyString(rdd: SupportedRDD): String 
asH2OFrameKeyString(rdd: SupportedRDD, frameName: String): String
//DataFrame
asH2OFrameKeyString(df: DataFrame): String
asH2OFrameKeyString(df: DataFrame, frameName: String): String

toH2OFrameKey()

toH2OFrameKey() method takes RDD/DataFrame as an input and transforms them into H2O frame key.


//RDD
toH2OFrameKey(rdd: SupportedRDD): Key[_]
toH2OFrameKey(rdd: SupportedRDD, frameName: String): Key[_]
//DataFrame
toH2OFrameKey(df: DataFrame): Key[Frame]
toH2OFrameKey(df: DataFrame, frameName: String): Key[Frame]

Other methods of Sparkling Water H2OContext

h2oLocalClient() – Returns IP:Port of REST API of H2O client

setH2OClientLogLevel() – Change the log level for the driver

get() – Returns existing H2O context

getH2ONodes() – Returns an array of nodes

stop() – Stops H2O context when it’s running in automatic mode of external backend. In Internal mode, context shutdowns when Spark stop’s it’s executors.

asH2OFrame(fr: Frame): H2OFrame – Creates a new H2OFrame from another Frame

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium