Site icon Spark By {Examples}

Hive Enable and Use Acid Transactions

hive enable acid transactions

Starting Version 0.14, Hive supports all ACID properties which enable us to use transactions, create transactional tables, and run queries like Insert, Update, and Delete on tables. In this article, I will explain how to enable and disable ACID Transactions Manager, create a transactional table, and finally performing Insert, Update, and Delete operations.

Table of Contents

Hive is a data warehouse database where the data is typically loaded from batch processing for analytical purposes and older versions of Hive doesn’t support ACID transactions on tables. Though in newer versions it supports by default ACID transactions are disabled and you need to enable it before start using it.

Hive supports full ACID semantics at the row level so that one application can add rows while another reads from the same partition without interfering with each other.

I will be using HiveServer2 and using Beeline commands

Hive Enable ACID Transactions

As said in the introduction, you need to enable ACID Transactions to support transactional queries. one of the important property need to know is hive.txn.manager which is used to set Hive Transaction manager, by default hive uses DummyTxnManager, to enable ACID, we need to set it to DbTxnManager

Below are the properties you need to enable ACID transactions.


SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
# The follwoing are not required if you are using Hive 2.0
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;
# The following parameters are required for standalone hive metastore
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1

Besides this, you also need to create a Transactional table by using

ACID Limitations On Hive

Below are some of the limitations of using Hive ACID transactions.

Hive Create Transactional Table

To support ACID transactions you need to create a table with TBLPROPERTIES (‘transactional’=’true’); and the store type of the table should be ORC.

Note: Once you create a table as an ACID table via TBLPROPERTIES (“transactional”=”true”), you cannot convert it back to a non-ACID table.


CREATE TABLE emp.employee_trans (
 id int,
 name string,
 age int,
 gender string)
 STORED AS ORC
 TBLPROPERTIES ('transactional'='true');

You can run the DESCRIBE FORMATTED emp.employee to check if the table is created with the transactional data_type as TRUE.

Hive INSERT Records to Table

Hive INSERT SQL query statement is used to insert individual or many records into the transactional table. If you are familiar with ANSI SQL, Hive uses similar syntax for basic queries like INSERT, UPDATE, and DELETE queries.

Insert Statement Syntax:


INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]

Below example insert few records into the table.


INSERT INTO emp.employee_trans VALUES(1,'James',30,'M');
INSERT INTO emp.employee_trans VALUES(2,'Ann',40,'F');
INSERT INTO emp.employee_trans VALUES(3,'Jeff',41,'M');
INSERT INTO emp.employee_trans VALUES(4,'Jennifer',20,'F')

Hive UPDATE Table

Hive UPDATE SQL query is used to update the existing records in a table, WHERE is an optional clause and below are some points to note using the WHERE clause with an update.

Update statement Syntax:


UPDATE [dbname.]tablename 
SET column = value [, column = value ...] 
[WHERE expression]

Below example updates age column to 45 for record id=3.


UPDATE emp.employee_trans 
SET age=45 
WHERE id=3;

Post UPDATE statement, selecting the table returns the below records. If you notice id=3, age got updated to 45.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 45                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.741 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Hive DELETE Records from Table

Hive DELETE SQL query is used to delete the records from a table. Hive also takes optional WHERE clause and below are some points to remember using WHERE clause.

Delete statement Syntax:


DELETE FROM [dbname.]tablename 
[WHERE expression]

Below DELETE example, delete record with id=4 from the table.


DELETE FROM emp.employee_trans 
WHERE id=4;

Post delete, selecting the table returns the below 3 records without id=4.


SELECT * FROM emp.employee_trans;
+--------------------+----------------------+---------------------+------------------------+
| employee_trans.id  | employee_trans.name  | employee_trans.age  | employee_trans.gender  |
+--------------------+----------------------+---------------------+------------------------+
| 1                  | James                | 30                  | M                      |
| 2                  | Ann                  | 40                  | F                      |
| 3                  | Jeff                 | 41                  | M                      |
+--------------------+----------------------+---------------------+------------------------+
3 rows selected (0.532 seconds)
0: jdbc:hive2://127.0.0.1:10000>

Other Hive ACID Transactional Commands

SHOW TRANSACTIONS

SHOW TRANSACTIONS statement is used to return the list of all transactions with start and end time along with other transaction properties.


SHOW TRANSACTIONS;

Returns below table with all transactions you run.

hive shw acid transactions
SHOW TRANSACTIONS

SHOW COMPACTIONS

SHOW COMPACTIONS statement returns all tables and partitions that are compacted or scheduled for compaction. Compaction is run automatically when Hive transactions are being used.


SHOW COMPACTIONS;

SHOW LOCKS

SHOW LOCKS statement is used to check the locks on the table or partitions. When working with transactions we often see table and records are getting locked. When the table is locked by another transaction you cannot run an update or delete until the locks are released.


SHOW LOCKS;
SHOW LOCKS <database_name>;
SHOW LOCKS <table_name>;
SHOW LOCKS <table_name> PARTITION (<partition_spec>);

Hive Disable ACID Transactions

Sometimes you may need to disable ACID Transactions, in order to do so you need to set the below properties back to their original values.


SET hive.support.concurrency=false;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
SET hive.enforce.bucketing=false;
SET hive.exec.dynamic.partition.mode=strict;
SET hive.compactor.initiator.on=false;
SET hive.compactor.worker.threads=0;

Conclusion

In summary to enable ACID like transactions on Hive, you need to do the follwoing.

  1. Enable ACID Transaction Manager (DbTxnManager) on hive session
  2. Enable Concurrency
  3. Create Table by enabling transactional (TBLPROPERTIES (‘transactional’=’true’))
  4. Create Table with ORC storage type
  5. Insert data into a table using INSERT INTO
  6. Finally, Run UPDATE and DELETE HiveQL queries on the table

Hope you like it.

Happy Learning !!

Exit mobile version