Sparkling Water contains the same features and functionality as H2O and it enables users to run H2O machine learning algorithms API on top of the Spark cluster allowing H2O to benefit from Spark capabilities like fast, scalable and distributed in-memory processing.
Sparling Water also enables users to run H2O Machine Learning models using Java, Scala, R and Python languages.
Integrating these two open-source environments (Spark & H2O) provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.– H2O Sparkling Water
Installing & Running Sparkling Water Shell on Windows
In order to run Sparkling Shell, you need to have an Apache Spark installed on your computer and have the SPARK_HOME environment variable set to the Spark home directory. If you do not have it installed, download it from here, unzip and set SPARK_HOME environment variable to your Spark directory.
Now, download H2O Sparkling Water and unzip the downloaded file. In my case, I’ve download Sparkling Water version 3.28 which supports Spark 2.4.4 and unzip into
cd C:\apps\opt\sparkling-water\bin C:\apps\opt\sparkling-water\bin>sparkling-shell ----- Spark master (MASTER) : local[*] Spark home (SPARK_HOME) : C:\apps\opt\spark-2.4.4-bin-hadoop2.7 H2O build version : 18.104.22.168 (yu) Spark build version : 2.4.4 Scala version : 2.11 ---- 20/02/13 07:34:48 WARN NativeCodeLoader: Unable to load native-hadoop library fo r your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve l(newLevel). Spark context Web UI available at http://DELL-ESUHAO2KAJ:4040 Spark context available as 'sc' (master = local[*], app id = local-1581608102876 ). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191) Type in expressions to have them evaluated. Type :help for more information. scala>
scala> import org.apache.spark.h2o._ import org.apache.spark.h2o._ scala> val h2oContext = H2OContext.getOrCreate(spark) h2oContext: org.apache.spark.h2o.H2OContext = Sparkling Water Context: * Sparkling Water Version: 22.214.171.124-1-2.4 * H2O name: sparkling-water-prabha_local-1581608102876 * cluster size: 1 * list of used nodes: (executorId, host, port) ------------------------ (driver,192.168.56.1,54321) ------------------------ Open H2O Flow in browser: http://192.168.56.1:54321 (CMD + click in Mac OSX) scala>
This also runs an H2O Flow web UI interface to interact with H2O. Open H2O Flow in browser: http://192.168.56.1:54321 (change the IP address to your system IP)