Apache Spark Installation on Windows

In this article, I will explain step-by-step how to do Apache Spark Installation on windows os 7, 10, and the latest version and also explains how to start a history server and monitor your jobs using Web UI. Related: PySpark Install on Windows Install Java 8 or Later To install…

Continue Reading Apache Spark Installation on Windows

Hive Cast Function to Convert Data Type

Hive CAST(from_datatype as to_datatype) function is used to convert from one data type to another for example to cast String to Integer(int), String to Bigint, String to Decimal, Decimal to Int data types, and many more. This cast() function is referred to as the type conversion function which is used…

Continue Reading Hive Cast Function to Convert Data Type

Hive Partitioning vs Bucketing with Examples?

In this article, I will explain what is Hive Partitioning and Bucketing, the difference between Hive Partitioning vs Bucketing by exploring the advantages and disadvantages of each features with examples. At a high level, Hive Partition is a way to split the large table into smaller tables based on the…

Continue Reading Hive Partitioning vs Bucketing with Examples?

Hive – How to Show All Partitions of a Table?

In Hive, SHOW PARTITIONS command is used to show or list all partitions of a table from Hive Metastore, In this article, I will explain how to list all partitions, filter partitions, and finally will see the actual HDFS location of a partition. How to start HiveServer2 and using BeelineDifference…

Continue Reading Hive – How to Show All Partitions of a Table?

How to Update or Drop a Hive Partition?

Hive ALTER TABLE command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive partition directly on HDFS using Hadoop commands, if you do so you need to run the MSCK command to synch…

Continue Reading How to Update or Drop a Hive Partition?

PySpark split() Column into Multiple Columns

pyspark.sql.functions provides a function split() to split DataFrame string Column into multiple columns. In this tutorial, you will learn how to split Dataframe single column into multiple columns using withColumn() and select() and also will explain how to use regular expression (regex) on split function. PySpark Split Column into multiple…

Continue Reading PySpark split() Column into Multiple Columns

Hive Delete and Update Records Using ACID Transactions

Since Hive Version 0.14, Hive supports ACID transactions like delete and update records/rows on Table with similar syntax as traditional SQL queries. You need to enable Hive ACID support and create a transactional table. On a table with transactional property, hive supports ACID transactions like Update and Delete operations. You…

Continue Reading Hive Delete and Update Records Using ACID Transactions

Export Hive Table into CSV File with Header?

To export a Hive table into a CSV file you can use either INSERT OVERWRITE DIRECTORY or by piping the output result of the select query into a CSV file. In this article, I will explain how to export the Hive table into a CSV file on HDFS, Local directory…

Continue Reading Export Hive Table into CSV File with Header?

Hive – Difference Between Internal Tables vs External Tables?

Let's learn what are Internal (Managed) and External tables and their differences, the main difference between Hive external table vs internal tables are owned and managed by Hive whereas external tables are not managed by Hive. In this article I will explain the difference between internal vs external table, by…

Continue Reading Hive – Difference Between Internal Tables vs External Tables?

How to Set Variables in HIVE Scripts

Hive variables are key-value pairs that can be set using the set command and they can be used in scripts and Hive SQL. The values of the variables in Hive scripts are substituted during the query construct. In this article, I will explain Hive variables, how to create and set…

Continue Reading How to Set Variables in HIVE Scripts