Let’s learn what are Internal (Managed) and External tables and their differences, the main difference between Hive external table vs internal tables are owned and managed by Hive whereas external tables are not managed by Hive.
In this article I will explain the difference between internal vs external table, by creating them, checking the location, and dropping them.
Managed or External tables can be identified using the DESCRIBE FORMATTED table_name command
Related Articles:
- How to start HiveServer2 & Connect Beeline
- Where does Hive store data files in HDFS?
- How to Create Different Hive Tables explained syntax and usage
What is Hive Internal or Managed Table?
Internal tables are also known as Managed tables that are owned and managed by Hive. By default, Hive creates a table as an Internal table and owned the table structure and the files.
In other words, Hive completely manages the lifecycle of the table (metadata & data) similar to tables in RDBMS.
For Internal tables, Hive by default stores the files at the data warehouse location which is located at /user/hive/warehouse
When you drop an internal table, it drops the data and also drops the metadata of the table.
Below is an example of creating internal table.
CREATE TABLE IF NOT EXISTS emp.employee (
id int,
name string,
age int,
gender string )
COMMENT 'Employee Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Use DESCRIBE FORMATTED emp.employee;
to get the description of the table and you should see Table Type as MANAGED TABLE
.
What is Hive External Table?
Data in External tables are not owned or managed by Hive. To create an External table you need to use EXTERNAL
clause.
Hive default stores external table files also at Hive managed data warehouse location but recommends to use external location using LOCATION
clause.
Dropping an external table just drops the metadata but not the actual data. The actual data is still accessible outside of Hive.
Below is an example of creating an external table in Hive. If you noticed we use EXTERNAL
and LOCATION
options.
CREATE EXTERNAL TABLE emp.employee_external (
id int,
name string,
age int,
gender string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/hive/data/employee_external';
Use DESCRIBE FORMATTED emp.employee_external;
to get the description of the table and you should see Table Type as EXTERNAL TABLE
.
Drop Internal vs External Tables
Regardless of the Internal and external table, Hive manages the table definition and its partition information in Hive Metastore. Dropping an internal table deletes the table metadata from Metastore and also removes all its data/files from HDFS.
Dropping an external table, just drop the metadata of the table from Metastore and keeps the actual data as-is om HDFS location.
Difference Between Internal vs External Tables
Below are the major differences between Internal vs External tables in Apache Hive.
Internal or Managed Table | External Table |
---|---|
By default, Hive creates an Internal or Managed Table. | Use EXTERNAL option/clause to create an external table |
Hive owns the metadata, table data by managing the lifecycle of the table | Hive manages the table metadata but not the underlying file. |
Dropping an Internal table drops metadata from Hive Metastore and files from HDFS | Dropping an external table drops just metadata from Metastore with out touching actual file on HDFS. |
Hive supports ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operations | Not supported |
Supports ACID/Transactional | Not supported |
Supports result caching | Not supported |
When to use External and Internal Tables
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
Conclusion
In this difference between the Internal and External tables article, you have learned internal/managed tables metadata and files are owned Hive server and manages complete table life cycle whereas only metadata is owned by external tables meaning dropping an external table just drops it’s metadata but not the actual file and also learned when to use internal table vs external table.
Happy Learning !!
Related Article
- Hive Data Types – Primitive | Complex
- Hive Built-in String Functions with Examples
- Hive DDL Commands Explained with Examples
- How to Connect to Hive Using Beeline
- Hive Create Partition Table Explained
- How to Update or Drop a Hive Partition?
- Hive Partitions Explained with Examples