Hive – How to Enable and Use ACID Transactions?

Starting Version 0.14, Hive supports all ACID properties which enable us to use transactions, create transactional tables, and run queries like Insert, Update, and Delete on tables. In this article, I will explain how to enable and disable ACID Transactions Manager, create a transactional table, and finally performing Insert, Update, and Delete operations.

Table of Contents

Hive is a data warehouse database where the data is typically loaded from batch processing for analytical purposes and older versions of Hive doesn’t support ACID transactions on tables. Though in newer versions it supports by default ACID transactions are disabled and you need to enable it before start using it.

Hive supports full ACID semantics at the row level so that one application can add rows while another reads from the same partition without interfering with each other.

I will be using HiveServer2 and using Beeline commands

Hive Enable ACID Transactions

As said in the introduction, you need to enable ACID Transactions to support transactional queries. one of the important property need to know is hive.txn.manager which is used to set Hive Transaction manager, by default hive uses DummyTxnManager, to enable ACID, we need to set it to DbTxnManager

Below are the properties you need to enable ACID transactions.


SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
# The follwoing are not required if you are using Hive 2.0
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;
# The following parameters are required for standalone hive metastore
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1

Besides this, you also need to create a Transactional table by using

  • TBLPROPERTIES (‘transactional’=’true’) at the time of creating Managed table.
  • Managed Table should use ORC format.

ACID Limitations On Hive

Below are some of the limitations of using Hive ACID transactions.

  • To support ACID, Hive tables should be created with TRANSACTIONAL table property.
  • Currently, Hive supports ACID transactions on tables that store ORC file format.
  • Enable ACID support by setting transaction manager to DbTxnManager
  • Transaction tables cannot be accessed from the non-ACID Transaction Manager (DummyTxnManager) session.
  • External tables cannot be created to support ACID since the changes on external tables are beyond Hive control.
  • LOAD is not supported on ACID transactional Tables. hence use INSERT INTO.
  • On Transactional session, all operations are auto commit as <strong>BEGIN</strong><strong>COMMIT</strong>, and <strong>ROLLBACK</strong> are not yet supported.

Hive Create Transactional Table

To support ACID transactions you need to create a table with TBLPROPERTIES (‘transactional’=’true’); and the store type of the table should be ORC.

Note: Once you create a table as an ACID table via TBLPROPERTIES (“transactional”=”true”), you cannot convert it back to a non-ACID table.


CREATE TABLE emp.employee_trans (
 id int,
 name string,
 age int,
 gender string)
 STORED AS ORC
 TBLPROPERTIES ('transactional'='true');

You can run the DESCRIBE FORMATTED emp.employee to check if the table is created with the transactional data_type as TRUE.

Hive INSERT Records to Table

Hive INSERT SQL query statement is used to insert individual or many records into the transactional table. If you are familiar with ANSI SQL, Hive uses similar syntax for basic queries like INSERT, UPDATE, and DELETE queries.

Insert Statement Syntax:


INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]

Below example insert few records into the table.


INSERT INTO emp.employee_trans VALUES(1,'James',30,'M');
INSERT INTO emp.employee_trans VALUES(2,'Ann',40,'F');
INSERT INTO emp.employee_trans VALUES(3,'Jeff',41,'M');
INSERT INTO emp.employee_trans VALUES(4,'Jennifer',20,'F')

Hive UPDATE Table

Hive UPDATE SQL query is used to update the existing records in a table, WHERE is an optional clause and below are some points to note using the WHERE clause with an update.

  • When WHERE clause not used, Hive updates all records in a table.
  • By using WHERE clause you can specify a condition which records to update.

Update statement Syntax:


UPDATE [dbname.]tablename 
SET column = value [, column = value ...] 
[WHERE expression]

Below example updates age column to 45 for record id=3.


UPDATE emp.employee_trans 
SET age=45 
WHERE id=3;

Post UPDATE statement, selecting the table returns the below records. If you notice id=3, age got updated to 45.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 45                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.741 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Hive DELETE Records from Table

Hive DELETE SQL query is used to delete the records from a table. Hive also takes optional WHERE clause and below are some points to remember using WHERE clause.

  • Not using a WHERE clause with DELETE statement, Hive delete all records from the table. This is similar to truncating the table.
  • Use WHERE clause to specify what records you wanted to update.

Delete statement Syntax:


DELETE FROM [dbname.]tablename 
[WHERE expression]

Below DELETE example, delete record with id=4 from the table.


DELETE FROM emp.employee_trans 
WHERE id=4;

Post delete, selecting the table returns the below 3 records without id=4.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 41                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.532 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Other Hive ACID Transactional Commands

SHOW TRANSACTIONS

<a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowTransactions">SHOW TRANSACTIONS</a> statement is used to return the list of all transactions with start and end time along with other transaction properties.


SHOW TRANSACTIONS;

Returns below table with all transactions you run.

hive shw acid transactions
SHOW TRANSACTIONS

SHOW COMPACTIONS

SHOW COMPACTIONS statement returns all tables and partitions that are compacted or scheduled for compaction. Compaction is run automatically when Hive transactions are being used.


SHOW COMPACTIONS;

SHOW LOCKS

SHOW LOCKS statement is used to check the locks on the table or partitions. When working with transactions we often see table and records are getting locked. When the table is locked by another transaction you cannot run an update or delete until the locks are released.


SHOW LOCKS;
SHOW LOCKS <database_name>;
SHOW LOCKS <table_name>;
SHOW LOCKS <table_name> PARTITION (<partition_spec>);

Hive Disable ACID Transactions

Sometimes you may need to disable ACID Transactions, in order to do so you need to set the below properties back to their original values.


SET hive.support.concurrency=false;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
SET hive.enforce.bucketing=false;
SET hive.exec.dynamic.partition.mode=strict;
SET hive.compactor.initiator.on=false;
SET hive.compactor.worker.threads=0;

Conclusion

In summary to enable ACID like transactions on Hive, you need to do the follwoing.

  1. Enable ACID Transaction Manager (DbTxnManager) on hive session
  2. Enable Concurrency
  3. Create Table by enabling transactional (TBLPROPERTIES (‘transactional’=’true’))
  4. Create Table with ORC storage type
  5. Insert data into a table using INSERT INTO
  6. Finally, Run UPDATE and DELETE HiveQL queries on the table

Hope you like it.

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Hive – How to Enable and Use ACID Transactions?