Hive partitions are used to split the larger table into several smaller parts based on one or multiple columns (partition key, for example, date, state e.t.c). The hive partition is similar to table partitioning available in SQL server or any other RDBMS database tables.
In this article you will learn what is Hive partition, why do we need partitions, its advantages, and finally how to create a partition table and performing some partition operations like add, rename, update, and delete partitions.
- What are Hive Partitions
- Partition Table Advantages
- Create Hive Partition Table
- Load Data into Partition Table
- Show All Partitions
- Add New Partition
- Rename or Update Partition
- Drop Hive Partition
What are Hive Partitions?
Hive table partition is a way to split a large table into smaller logical tables based on one or more partition keys. These smaller logical tables are not visible to users and users still access the data from just one table.
Partition eliminates creating smaller tables, accessing, and managing them separately.
When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a sub-directory of tables directory on HDFS. The name of the directory would be partition key and it’s value.
Also, note that while loading the data into the partition table, Hive eliminates the partition key from the actual loaded file on HDFS as it is redundant information and could be get from the partition folder name, will see this with examples in the next sessions.
Partition Table Advantages
As you are aware Hive is a data warehouse database for the Hadoop and used to store large big data sets. Below are some of the advantages using Hive partition tables.
- Fast accessed to the data
- Provides the ability to perform an operation on a smaller dataset
Create Hive Partition Table
To create a Hive table with partitions, you need to use PARTITIONED BY
clause along with the column you wanted to partition and its type. Let’s create a table and Load the CSV file.
The data file that I am using to explain partitions can be downloaded from GitHub, It’s a simplified zipcodes codes where I have RecordNumber, Country, City, Zipcode, and State columns. I will be using State as a partition column.
Start your Hive beeline or Hive terminal and create the managed table as below.
CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Load Data into Partition Table
Download the zipcodes.CSV from GitHub, upload it to HDFS, and finally load the CSV file into a partition table.
hdfs dfs -put zipcodes.csv /data/
jdbc:hive2://127.0.0.1:10000> LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes;
Show All Partitions on Hive Table
After loading the data into the Hive partition table, you can use SHOW PARTITIONS
command to see all partitions that are present.
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition |
+------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+------------+
6 rows selected (0.074 seconds)
Alternatively, if you know the Hive store location on the HDFS for your table, you can run the HDFS command to check the partitions.
For each partition on the table, you will see a folder created with the partition column name and the partition value.

Add New Partition to the Hive Table
A new partition can be added to the table using the ALERT TABLE
statement, you can also specify the location where you wanted to store partition data on HDFS.
jdbc:hive2://127.0.0.1:10000>ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION '/user/data/zipcodes_ca';
SHOW PARTITIONS
return the added partition.
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition |
+------------+
| state=AL |
| state=AZ |
| state=CA |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+------------+
7 rows selected (0.081 seconds)
Rename or Update Hive Partition
Using ALTER TABLE
, you can also rename or update the specific partition.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');
Manually Renaming Partitions on HDFS
Alternatively, you can also rename the partition directory on the HDFS.
hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL
When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE
to update the Hive Metastore. Not doing so will result in inconsistent results.
SELECT
doesn’t show the renamed partitionSHOW PARTITIONS
still shows the older partition
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition |
+------------+
| state=AZ |
| state=CA |
| state=FL |
| state=NC |
| state=NY |
| state=PR |
| state=TX |
+------------+
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
Above command synchronize zipcodes table on Hive Metastore. Now run the show partition command which shows state=AL partition.
Drop Hive Partition
Dropping a partition can also be performed using ALTER TABLE tablename DROP
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');
Not using IF EXISTS
result in error when specified partition not exists.
Manually Dropping Partitions on HDFS
You can also delete the partition directly from HDFS using below command.
[email protected]:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL
Running SELECT
command on the table doesn’t show the records from removed partitions, however, SHOW PARTITIONS
still shows the deleted partitions.
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
6 rows selected (0.096 seconds)
In order to fix this, you need to run MSCK REPAIR TABLE
as shown below.
jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
5 rows selected (0.096 seconds)
Other Useful Partition Commands
Below are some of the additional partition commands that are useful.
How to Filter Partitions?
If you have 100’s of partitions, you can check if a specific partition exists on the table using SHOW PARTITIONS tablename PARTITION
SHOW PARTITIONS zipcodes PARTITION(state='NC');
+------------+
| partition |
+------------+
| state=NC |
+------------+
1 row selected (0.182 seconds)
How to Know Specific Partition Location on HDFS?
Either of the below statements is used to know the HDFS location of each partition.
DESCRIBE FORMATTED zipcodes PARTITION(state='PR');
SHOW TABLE EXTENDED LIKE zipcodes PARTITION(state='PR');
Running SHOW TABLE EXTENDED
on table and partition results in the below output. location attribute shows the location of the partition file on HDFS.

Conclusion
In this article, you have learned Hive table partition is used to split a larger table into smaller tables by splitting based on one or multiple partitions columns also learned the following
PARTITIONED BY
is used to create a partition tableALTER TABLE
is used to add, rename, drop partitionsSHOW PARTITIONS
is used to show the partitions of the tableMSCK REPAIR
is used to synch Hive Metastore with the HDFS data.
Related Articles
- Hive – How to Show All Partitions of a Table?
- Hive Relational | Arithmetic | Logical Operators
- Apache Hive Installation on Ubuntu
- Hive Bucketing Explained with Examples
- Hive Partitioning vs Bucketing with Examples?
- Hive – INSERT INTO vs INSERT OVERWRITE Explained
- Hive Load CSV File into Table
Reference
Happy Learning !!