Hive ALTER TABLE
command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive partition directly on HDFS using Hadoop commands, if you do so you need to run the MSCK
command to synch up HDFS files with Hive Metastore.
Related Articles
- How to start HiveServer2 and Using Beeline
- Difference between Internal Managed Table and External Table
Hive partition breaks the table into multiple tables (on HDFS multiple subdirectories) based on the partition key. Partition key could be one or multiple columns. For each distinct value of the partition key, a subdirectory will be created on HDFS.
Let’s say you have a large table with a state column and you often required to run analytics-related queries for each state hence, the state column is qualified to be a partition column.
In order to explain update and drop Hive partition let’s assume you have a zipcodes table with the below data.
+--------------+---------+-------+
| recordnumber | zipcode | state |
+--------------+---------+-------+
| 54355 | 35146 | AL |
| 54356 | 35585 | AL |
| 39827 | 85209 | AZ |
| 39828 | 85210 | AZ |
| 49348 | 34487 | FL |
| 76512 | 27203 | NC |
| 76513 | 27204 | NC |
| 4 | 704 | PR |
| 3 | 704 | PR |
| 61391 | 76166 | TX |
+--------------+---------+-------+
Update Hive Partition
You can use the Hive ALTER TABLE
command to change the HDFS directory location of a specific partition. The below example update the state=NC partition location from the default Hive store to a custom location /data/state=NC.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION(state='NC')
SET LOCATION '/data/state=NC';
Rename Hive Partition
You can also use ALTER TABLE with PARTITION RENAME
to rename the Hive partition. The below example rename partition state=’AL’ to state=’NY’
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');
Manually Renaming Partitions on HDFS
Alternatively, you can also rename the partition directory on the HDFS. let’s rename partition state=’NY’ back to it’s original state=’AL’
hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL
When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE
to update the Hive Metastore. Not doing so will result in inconsistent results.
SELECT
doesn’t show the renamed partition<a href="https://sparkbyexamples.com/apache-hive/hive-show-all-table-partitions/">SHOW PARTITIONS</a>
still shows the older partition
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition |
+------------+
| state=AZ |
| state=FL |
| state=NC |
| state=NY |
| state=PR |
| state=TX |
+------------+
If you notice above, it still showing partition state=NY
, to correct this run MSCK REPAIR TABLE
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
Above command synchronize zipcodes table on Hive Metastore. Now run the show partition command which shows state=AL partition.
Drop or Delete Hive Partition
Hive drop or delete partition is performed using ALTER TABLE tablename DROP
command. Dropping a partition from a table removes the data from HDFS and from Hive Metastore.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');
When dropping a partition that doesn’t exist, it returns an error. This behavior can be changed using IF EXISTS
 optional clause; where it executes the drop partitions only when the partition present in the Hive table.
If Trash is configured by setting true
 to hive.warehouse.data.skipTrash
 property, dropping a Hive partition moves the partition data to users .Trash directory. You can recover this data post drop if needed.
Note: Data moving to .Trash directory happens only for Internal/Managed table. For the external table, DROP partition just removes the partition from Hive Metastore and the partition is still present on HDFS. You need to run explicitly hadoop fs -rm
commnad to remove the partition from HDFS.
Using PURGE
You can use the PURGE
option to not move the data to .Trash directory, the data will be permanently removed and it can not be recovered.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL') PURGE;
Above example permanently drops state=AL partition
Manually Dropping Partitions on HDFS
You can also delete the partition directly from HDFS using below command.
[email protected]:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL
Running SELECT
command on the table doesn’t show the records from removed partitions, however, SHOW PARTITIONS
still shows the deleted partitions.
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
6 rows selected (0.096 seconds)
In order to fix this, you need to run MSCK REPAIR TABLE
as shown below.
jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
5 rows selected (0.096 seconds)
Conclusion
In this article, you have learned how to update, drop or delete hive partition using ALTER TABLE
command, and also learned using SHOW PARTITIONS
to show the partitions of the table, using MSCK REPAIR
to synch Hive Metastore with the HDFS data.
Related Articles
- Hive List or Show All Partitions of a Table
- How to Connect to Hive Using Beeline
- How to Set Variables in Hive Scripts Examples
- How to connect to Hive from Java & Scala Examples
- Hive – INSERT INTO vs INSERT OVERWRITE Explained
- Hive Partitioning vs Bucketing with Examples?
- Hive Bucketing Explained with Examples
- Apache Hive Installation on Ubuntu