Site icon Spark By {Examples}

Sparkling Water – H2OContext

sparkling water h2ocontext

H2OContext is an entry point to the Sparkling Water and this is used to connect to external H2O cluster or to create a standalone and local cluster. H2O Context class provides methods to transform or convert RDD/DataFrame to H2OFrame and H2OFrame to RDD/DataFrame. and also implicits conversions.

Creating H2OContext object

H2OContext is created using the builder method getOrCreate(), this takes SparkSession object as a parameter and optionally H2OConf object.


val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();

val h2oContext = H2OContext.getOrCreate(spark)

Using H2OConf with H2O context

H2OConf (H2O Configuration) is a wrapper on SparkConf and inherits all properties from SparkConf and provides additional properties related to the H2O Sparkling Water cluster.


val h2oConf = new H2OConf(spark).setCloudName("CloudName1")
val h2oContext = H2OContext.getOrCreate(spark, h2oConf )

Above example provides a cloud name “CloudName1” to the H2O cluster, For more details please read Usage of H2O Configuration

asH2OFrame() – Convert DataFame to H2OFrame

Use asH2OFrame() method of H2OContext to convert or transform Spark DataFrame to Sparkling water H2OFrame.

Syntax


asH2OFrame(df: DataFrame): H2OFrame
asH2OFrame(df: DataFrame, frameName: String)

Example


val zipCodes = "src/main/resources/small_zipcode.csv"
val zipCodesDF = spark.read.option("header", "true")
    .option("inferSchema", "true")
    .csv(zipCodes)

val h2oContext = H2OContext.getOrCreate(spark)
val h2oFrame = h2oContext.asH2OFrame(zipCodesDF)

asDataFrame() – Convert H2OFrame to DataFrame

Use asDataFrame() method of H2OContext to convert or transform Sparkling water H2OFrame to Spark DataFrame.

Syntax


 def asDataFrame[T <: Frame](fr: T, copyMetadata: Boolean = true): DataFrame

Example


  val h2oContext = H2OContext.getOrCreate(spark)

  //Creating H2OFrame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

  //Convert H2OFrame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)

asH2OFrame() – Convert RDD to H2OFrame

asH2OFrame() is an overloaded method, this takes Spark RDD as a parameter and returns H2OFrame.


asH2OFrame(rdd: SupportedRDD): H2OFrame
asH2OFrame(rdd: SupportedRDD, frameName: String)

asRDD() – H2OFrame to RDD

asRDD() transforms H2O Frame to Spark RDD


asRDD[A <: Product : TypeTag : ClassTag](fr: H2OFrame)

asH2OFrameKeyString()

asH2OFrameKeyString() method takes RDD/DataFrame as a parameter and transforms to H2OFrame and returns a string representation of H2O Frame Key.


//RDD
asH2OFrameKeyString(rdd: SupportedRDD): String 
asH2OFrameKeyString(rdd: SupportedRDD, frameName: String): String
//DataFrame
asH2OFrameKeyString(df: DataFrame): String
asH2OFrameKeyString(df: DataFrame, frameName: String): String

toH2OFrameKey()

toH2OFrameKey() method takes RDD/DataFrame as an input and transforms them into H2O frame key.


//RDD
toH2OFrameKey(rdd: SupportedRDD): Key[_]
toH2OFrameKey(rdd: SupportedRDD, frameName: String): Key[_]
//DataFrame
toH2OFrameKey(df: DataFrame): Key[Frame]
toH2OFrameKey(df: DataFrame, frameName: String): Key[Frame]

Other methods of Sparkling Water H2OContext

h2oLocalClient() – Returns IP:Port of REST API of H2O client

setH2OClientLogLevel() – Change the log level for the driver

get() – Returns existing H2O context

getH2ONodes() – Returns an array of nodes

stop() – Stops H2O context when it’s running in automatic mode of external backend. In Internal mode, context shutdowns when Spark stop’s it’s executors.

asH2OFrame(fr: Frame): H2OFrame – Creates a new H2OFrame from another Frame

Exit mobile version