• Post author:
  • Post category:Apache Hive
  • Post last modified:January 6, 2023
  • Reading time:7 mins read
You are currently viewing Hive – Difference Between Internal Tables vs External Tables?

Let’s learn what are Internal (Managed) and External tables and their differences, the main difference between Hive external table vs internal tables are owned and managed by Hive whereas external tables are not managed by Hive.

In this article I will explain the difference between internal vs external table, by creating them, checking the location, and dropping them.

Managed or External tables can be identified using the DESCRIBE FORMATTED table_name command

Related Articles:

What is Hive Internal or Managed Table?

Internal tables are also known as Managed tables that are owned and managed by Hive. By default, Hive creates a table as an Internal table and owned the table structure and the files.

In other words, Hive completely manages the lifecycle of the table (metadata & data) similar to tables in RDBMS.

For Internal tables, Hive by default stores the files at the data warehouse location which is located at /user/hive/warehouse

When you drop an internal table, it drops the data and also drops the metadata of the table.

Below is an example of creating internal table.


CREATE TABLE IF NOT EXISTS emp.employee (
 id int,
 name string,
 age int,
 gender string )
 COMMENT 'Employee Table'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ',';

Use DESCRIBE FORMATTED emp.employee; to get the description of the table and you should see Table Type as MANAGED TABLE.

Hive internal vs external table

What is Hive External Table?

Data in External tables are not owned or managed by Hive. To create an External table you need to use EXTERNAL clause.

Hive default stores external table files also at Hive managed data warehouse location but recommends to use external location using LOCATION clause.

Dropping an external table just drops the metadata but not the actual data. The actual data is still accessible outside of Hive.

Below is an example of creating an external table in Hive. If you noticed we use EXTERNAL and LOCATION options.


CREATE EXTERNAL TABLE emp.employee_external (
 id int,
 name string,
 age int,
 gender string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LOCATION '/user/hive/data/employee_external';

Use DESCRIBE FORMATTED emp.employee_external; to get the description of the table and you should see Table Type as EXTERNAL TABLE.

Hive managed vs external table

Drop Internal vs External Tables

Regardless of the Internal and external table, Hive manages the table definition and its partition information in Hive Metastore. Dropping an internal table deletes the table metadata from Metastore and also removes all its data/files from HDFS.

Dropping an external table, just drop the metadata of the table from Metastore and keeps the actual data as-is om HDFS location.

Difference Between Internal vs External Tables

Below are the major differences between Internal vs External tables in Apache Hive.

Internal or Managed TableExternal Table
By default, Hive creates an Internal or Managed Table.Use EXTERNAL option/clause to create an external table
Hive owns the metadata, table data by managing the lifecycle of the tableHive manages the table metadata but not the underlying file.
Dropping an Internal table drops metadata from Hive Metastore and files from HDFSDropping an external table drops just metadata from Metastore with out touching actual file on HDFS.
Hive supports ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operationsNot supported
Supports ACID/Transactional Not supported
Supports result cachingNot supported

When to use External and Internal Tables

Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.

Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.

Conclusion

In this difference between the Internal and External tables article, you have learned internal/managed tables metadata and files are owned Hive server and manages complete table life cycle whereas only metadata is owned by external tables meaning dropping an external table just drops it’s metadata but not the actual file and also learned when to use internal table vs external table.

Happy Learning !!

Related Article

Prabha

Prabha is an accomplished data engineer with a wealth of experience in architecting, developing, and optimizing data pipelines and infrastructure. With a strong foundation in software engineering and a deep understanding of data systems, Prabha excels in building scalable solutions that handle diverse and large datasets efficiently. At SparkbyExamples.com Prabha writes her experience in Spark, PySpark, Python and Pandas.

Leave a Reply