The Hive partition table can be created using PARTITIONED BY
clause of the CREATE TABLE
statement. Use the partition key column along with the data type in PARTITIONED BY
clause.
In this article you will learn what is Hive partition, why do we need partitions, its advantages, and finally how to create a partition table.
Why do we need Partitions?
Hive partition is a way to organize a large table into several smaller tables based on one or multiple columns (partition key, for example, date, state e.t.c).
The hive partition is similar to table partitioning available in SQL server or any other RDBMS database tables.
Partition eliminates creating smaller physical tables, accessing, and managing them separately.
When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a separate directory on HDFS. The name of the directory would be partition key and it’s value.
Also, note that while loading the data into the partition table, Hive eliminates the partition key from the actual loaded file on HDFS as it is redundant information and could be get from the partition folder name.
Create Hive Partition Table
To create a Hive table with partitions, you need to use PARTITIONED BY
clause along with the column you wanted to partition and its type. Let’s create a partition table and load the CSV file into it.
To demonstrate partitions, I will be using a different dataset than I used before, you can download it from GitHub, It’s a simplified zipcodes codes where I have RecordNumber, Country, City, Zipcode, and State columns. I will be using State as a partition column.
Start your Hive beeline or Hive terminal and create the managed table as below.
CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Describe Hive Partition Table
Let’s describe the Hive partition table we just created, describe command shows all partitions information
jdbc:hive2://127.0.0.1:10000> DESCRIBE zipcodes;
Yields below output. Notice the highlighted partition information for metadata of the partition columns.
Use DESCRIBE FORMATTED
for more information of the table partition.
jdbc:hive2://127.0.0.1:10000> DESCRIBE FORMATTED zipcodes;
Yields below output. Note the highlighted column names where it shows all partition column of the table and location where partitions will store.
You can also create a partition table with multiple partition keys as shown below. Here I am partitioned with state and zipcode.
CREATE TABLE zipcodes_multiple(
RecordNumber int,
Country string,
City string)
PARTITIONED BY(state string, zipcode string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Happy Learning !!
Related Articles
- How to Show All Hive Partitions of a Table
- How to Update or Drop a Hive Partition
- Difference Between Managed vs External Tables
- How to Connect to Hive using Beeline CLI
- Hive Partitions Explained with Examples
- Hive – How to Show All Partitions of a Table?
- Hive Relational | Arithmetic | Logical Operators
- Apache Hive Installation on Ubuntu