Install & Running Sparkling Water on Mac OS

In this tutorial, you will learn how to install H2O Sparkling Water on Mac OS and running H2O sparkling-shell and Flow web interface. In order to run Sparkling Water, you need to have an Apache Spark installed on your computer.

Sparkling Water enables users to run H2O machine learning algorithms on the Spark cluster which allows H2O to benefit from Spark capabilities like fast, scalable and distributed in-memory processing.

1. Install Java

Sparkling Water needs Java to be installed, Run below command to install JDK, In my case, I am using OpenJDK


# brew tap AdoptOpenJDK/openjdk
# brew cask install adoptopenjdk8

Post JDK install, check if it installed successfully by running


# java -version

And to know the location of where Java installed


# which java

2. Download and Install Apache Spark

First, download Apache Spark, unzip the binary to a directory on your computer and have the SPARK_HOME environment variable set to the Spark home directory. I’ve downloaded spark-2.4.4-bin-hadoop2.7 version, Depending on when you reading this download the latest version available and the steps should not have changed much.

You can also install using brew as show below.


# brew install apache-spark

3. Download & Install H2O Sparkling Water

Now, download H2O Sparkling Water, At this time there is no brew install for H20 hence, we need to manually download and install.


macos:~$ wget https://s3.amazonaws.com/h2o-release/sparkling-water/spark-2.4/3.28.0.3-1-2.4/sparkling-water-3.28.0.3-1-2.4.zip

and unzip the downloaded file. In case if you don’t have unzip package installed, install it using sudo apt install unzip


macos:~$ unzip sparkling-water-3.28.0.3-1-2.4.zip

In my case, I’ve download Sparkling Water version 3.28 which supports Spark 2.4.4 and unzip into /home/macos/sparkling-water-3.28.0.3-1-2.4

4. Start Sparkling Shell on mac

To start Sparkling shell, open terminal, change directory to /home/macos/sparkling-water-3.28.0.3-1-2.4 and run ./bin/sparkling-shell which outputs something like below. This also initializes Spark Context with Web UI available at http://192.168.56.1:4040 (change IP address to your system IP)


macos:~/sparkling-water-3.28.0.3-1-2.4$ ./bin/sparkling-shell

Using Spark defined in the SPARK_HOME=/home/macos/spark environmental property


-----
  Spark master (MASTER)     : local[*]
  Spark home   (SPARK_HOME) : /home/macos/spark
  H2O build version         : 3.28.0.3 (yu)
  Sparkling Water version   : 3.28.0.3-1-2.4
  Spark build version       : 2.4.4
  Scala version             : 2.11
----

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://namenode.socal.rr.com:4040
Spark context available as 'sc' (master = local[*], app id = local-1581895354791).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Now let’s create H2OContext by taking SparkSession object “spark” as a parameter, This creates an H2O Cloud inside the Spark Cluster.


scala> import org.apache.spark.h2o._
import org.apache.spark.h2o._

scala> val h2oContext = H2OContext.getOrCreate(spark)
2020-02-16 23:53:28,362 WARN internal.InternalH2OBackend: To avoid non-deterministic behavior of Spark broadcast-based joins,
we recommend to set `spark.sql.autoBroadcastJoinThreshold` property of SparkSession to -1.
E.g. spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
We also recommend to avoid using broadcast hints in your Spark SQL code.
h2oContext: org.apache.spark.h2o.H2OContext =

Sparkling Water Context:
 * Sparkling Water Version: 3.28.0.3-1-2.4
 * H2O name: sparkling-water-ubuntu_local-1581897180995
 * cluster size: 1
 * list of used nodes:
  (executorId, host, port)
  ------------------------
  (driver,192.168.56.1,54321)
  ------------------------

  Open H2O Flow in browser: http://192.168.56.1:54321 (CMD + click in Mac OSX)
scala>

This also runs an H2O Flow web UI interface to interact and run machine learning models. Open Flow in browser: http://192.168.56.1:54321 (change the IP address to your system IP) . For now, ignore the warnings you get.

Sparkling Water ubuntu H2O Flow

Conclusion

In this article, you have learned how to install H2O Sparkling Water on Mac OS and running sparkling-shell and finally created H2OContext where you can access the H2O Flow web UI interface.

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Install & Running Sparkling Water on Mac OS