Let’s learn what are Internal (Managed) and External tables and their differences, the main difference between Hive external table vs internal tables are owned and managed by Hive whereas external tables are not managed by Hive.
In this article I will explain the difference between internal vs external table, by creating them, checking the location, and dropping them.
Managed or External tables can be identified using the DESCRIBE FORMATTED table_name command
- How to start HiveServer2 & Connect Beeline
- Where does Hive store data files in HDFS?
- How to Create Different Hive Tables explained syntax and usage
What is Hive Internal or Managed Table?
Internal tables are also known as Managed tables that are owned and managed by Hive. By default, Hive creates a table as an Internal table and owned the table structure and the files.
In other words, Hive completely manages the lifecycle of the table (metadata & data) similar to tables in RDBMS.
For Internal tables, Hive by default stores the files at the data warehouse location which is located at
When you drop an internal table, it drops the data and also drops the metadata of the table.
Below is an example of creating internal table.
CREATE TABLE IF NOT EXISTS emp.employee (
gender string )
COMMENT 'Employee Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
DESCRIBE FORMATTED emp.employee; to get the description of the table and you should see Table Type as
What is Hive External Table?
Data in External tables are not owned or managed by Hive. To create an External table you need to use
Hive default stores external table files also at Hive managed data warehouse location but recommends to use external location using
Dropping an external table just drops the metadata but not the actual data. The actual data is still accessible outside of Hive.
Below is an example of creating an external table in Hive. If you noticed we use
CREATE EXTERNAL TABLE emp.employee_external (
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
DESCRIBE FORMATTED emp.employee_external; to get the description of the table and you should see Table Type as
Drop Internal vs External Tables
Regardless of the Internal and external table, Hive manages the table definition and its partition information in Hive Metastore. Dropping an internal table deletes the table metadata from Metastore and also removes all its data/files from HDFS.
Dropping an external table, just drop the metadata of the table from Metastore and keeps the actual data as-is om HDFS location.
Difference Between Internal vs External Tables
Below are the major differences between Internal vs External tables in Apache Hive.
|Internal or Managed Table
|By default, Hive creates an Internal or Managed Table.
|Use EXTERNAL option/clause to create an external table
|Hive owns the metadata, table data by managing the lifecycle of the table
|Hive manages the table metadata but not the underlying file.
|Dropping an Internal table drops metadata from Hive Metastore and files from HDFS
|Dropping an external table drops just metadata from Metastore with out touching actual file on HDFS.
|Hive supports ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operations
|Supports result caching
When to use External and Internal Tables
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
In this difference between the Internal and External tables article, you have learned internal/managed tables metadata and files are owned Hive server and manages complete table life cycle whereas only metadata is owned by external tables meaning dropping an external table just drops it’s metadata but not the actual file and also learned when to use internal table vs external table.
Happy Learning !!