Hive ALTER TABLE
command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive partition directly on HDFS using Hadoop commands, if you do so you need to run the MSCK
command to synch up HDFS files with Hive Metastore.
Related Articles
- How to start HiveServer2 and Using Beeline
- Difference between Internal Managed Table and External Table
Hive partition breaks the table into multiple tables (on HDFS multiple subdirectories) based on the partition key. Partition key could be one or multiple columns. For each distinct value of the partition key, a subdirectory will be created on HDFS.
Let’s say you have a large table with a state column and you often required to run analytics-related queries for each state hence, the state column is qualified to be a partition column.
In order to explain update and drop Hive partition let’s assume you have a zipcodes table with the below data.
+--------------+---------+-------+
| recordnumber | zipcode | state |
+--------------+---------+-------+
| 54355 | 35146 | AL |
| 54356 | 35585 | AL |
| 39827 | 85209 | AZ |
| 39828 | 85210 | AZ |
| 49348 | 34487 | FL |
| 76512 | 27203 | NC |
| 76513 | 27204 | NC |
| 4 | 704 | PR |
| 3 | 704 | PR |
| 61391 | 76166 | TX |
+--------------+---------+-------+
Update Hive Partition
You can use the Hive ALTER TABLE
command to change the HDFS directory location of a specific partition. The below example update the state=NC partition location from the default Hive store to a custom location /data/state=NC.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION(state='NC')
SET LOCATION '/data/state=NC';
Rename Hive Partition
You can also use ALTER TABLE with PARTITION RENAME
to rename the Hive partition. The below example rename partition state=’AL’ to state=’NY’
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');
Manually Renaming Partitions on HDFS
Alternatively, you can also rename the partition directory on the HDFS. let’s rename partition state=’NY’ back to it’s original state=’AL’
hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL
When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE
to update the Hive Metastore. Not doing so will result in inconsistent results.
SELECT
doesn’t show the renamed partition- SHOW PARTITIONS still shows the older partition
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition |
+------------+
| state=AZ |
| state=FL |
| state=NC |
| state=NY |
| state=PR |
| state=TX |
+------------+
If you notice above, it still showing partition state=NY
, to correct this run MSCK REPAIR TABLE
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
Above command synchronize zipcodes table on Hive Metastore. Now run the show partition command which shows state=AL partition.
Drop or Delete Hive Partition
Hive drop or delete partition is performed using ALTER TABLE tablename DROP
command. Dropping a partition from a table removes the data from HDFS and from Hive Metastore.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');
If you attempt to drop a partition that does not exist, an error will be returned. However, you can alter this behavior using the optional IF EXISTS clause. With this clause, the drop partitions command will only execute if the partition exists in the Hive table.
If the Trash feature is enabled by setting hive.warehouse.data.skipTrash property to true, dropping a Hive partition relocates the partition data to the user’s .Trash directory. This allows you to recover the data if needed after the drop operation.
Note: Data moving to .Trash directory happens only for Internal/Managed table. For the external table, DROP partition just removes the partition from Hive Metastore and the partition is still present on HDFS. You need to run explicitly hadoop fs -rm
commnad to remove the partition from HDFS.
Using PURGE
By utilizing the PURGE option, you can prevent the data from being moved to the .Trash directory. Instead, it will be permanently deleted, and recovery will not be possible.
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL') PURGE;
Above example permanently drops state=AL partition
Manually Dropping Partitions on HDFS
You can also delete the partition directly from HDFS using below command.
user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL
Running SELECT
command on the table doesn’t show the records from removed partitions, however, SHOW PARTITIONS
still shows the deleted partitions.
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
6 rows selected (0.096 seconds)
In order to fix this, you need to run MSCK REPAIR TABLE
as shown below.
jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
| partition |
+-----------------------------------+
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+-----------------------------------+
5 rows selected (0.096 seconds)
Conclusion
In this article, you have learned how to update, drop or delete hive partition using ALTER TABLE
command, and also learned using SHOW PARTITIONS
to show the partitions of the table, using MSCK REPAIR
to synch Hive Metastore with the HDFS data.
Related Articles
- Hive List or Show All Partitions of a Table
- How to Connect to Hive Using Beeline
- How to Set Variables in Hive Scripts Examples
- How to connect to Hive from Java & Scala Examples
- Hive – INSERT INTO vs INSERT OVERWRITE Explained
- Hive Partitioning vs Bucketing with Examples?
- Hive Bucketing Explained with Examples
- Apache Hive Installation on Ubuntu