You are currently viewing Hive Delete and Update Records Using ACID Transactions

Since Hive Version 0.14, Hive supports ACID transactions like delete and update records/rows on Table with similar syntax as traditional SQL queries. You need to enable Hive ACID support and create a transactional table. On a table with transactional property, hive supports ACID transactions like Update and Delete operations.

Advertisements

If you try to Delete or Update records of Hive regular tables, you will get the error saying table is not transactional (state=42000,code=10297)

Hive delete update tables

Below is the sequence of steps involved to Update and Delete records/rows on the Hive table.

  1. Enable ACID Transaction Manager (DbTxnManager) on hive session
  2. Enable Concurrency
  3. Create Table by enabling transactional (TBLPROPERTIES (‘transactional’=’true’))
  4. Create Table with ORC storage type
  5. Insert data into a table using INSERT INTO
  6. Finally, Run UPDATE and DELETE HiveQL queries on the table

Let’s look at these in details.

Hive Enable ACID Transactions

Though Hive latest versions support ACID transactions like Update and Delete records, Hive doesn’t enable them by default hence you can’t perform Delete or Update operations. However, by setting the below properties you tell Hive to enable transaction manager that enables door for Delete and Updates queries to run.


SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

In case if you try to create a Transactional table without enabling the above properties, you will get the below error.

Error: Error while compiling statement: FAILED: SemanticException [Error 10265]: This command is not allowed on an ACID table emp.tr1 with a non-ACID transaction manager. Failed command: create table emp.employee_trans(id int,age int) STORED AS ORC TBLPROPERTIES (‘transactional’=’true’) (state=42000,code=10265)

Below are some important points to note on ACID transactional manager

  • To support ACID, Hive tables should be created with TRANSACTIONAL table property.
  • Currently, Hive supports transactional tables with Only ORC file format 
  • Transaction tables can be created, update, and read from only the ACID Transaction Manager session.
  • External tables cannot be created to support ACID since the changes on external tables are beyond Hive control.
  • LOAD is not supported on TRANSACTIONAL Tables. hence use INSERT INTO
  • On Transactional session, all operations are auto commit as BEGINCOMMIT, and ROLLBACK are not yet supported

Hive Delete Records From Table

Like traditional SQL, the DELETE statement on Hive has a similar syntax. The below example deletes a record from the table by specifying where filter.

Delete Statement Syntax


DELETE FROM [dbname.]tablename [WHERE expression]

Delete Statement Example


DELETE FROM emp.employee_trans WHERE id=4;

Post delete, selecting the table returns the below result without id=4.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 41                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.532 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Hive Update Records on Table

UPDATE statement on Hive also has a similar syntax to the traditional SQL update query.

Update Statement Syntax


UPDATE [dbname.]tablename SET column = value [, column = value ...] [WHERE expression]

Update Statement Example


UPDATE emp.employee_trans set age=45 where id=3;

Post UPDATE statement, selecting the table returns the below records. If you notice id=3, age got updated to 45.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 45                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.741 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply