<strong>CREATE TABLE</strong> statement is used to create a table, it is similar to creating a table in RDBMS using SQL syntax, additionally, Hive has many more features to work with files.
In this article, I will explain Hive CREATE TABLE usage and syntax, different types of tables Hive supports, where Hive stores table data in HDFS, how to change the default location, how to load the data from files to Hive table, and finally using partitions.
Table of Contents
- Hive Create Table Syntax
- Hive Create Table & Insert Example
- Hive Table Types
- Create Table From Existing Table
1. Hive Create Table Syntax
CREATE TABLE statement you can create a table in Hive, It is similar to SQL and
CREATE TABLE statement takes multiple optional clauses,
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type [column_constraint] [COMMENT col_comment], ...)] [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [CLUSTERED BY (col_name, col_name,.......] [COMMENT table_comment] [ROW FORMAT row_format] [FIELDS TERMINATED BY char] [LINES TERMINATED BY char] [LOCATION 'hdfs_path'] [STORED AS file_format]
In Hive, table can be created with or without the database, If you wanted to create in a database, specify database name qualifier.
Note: By default, the Hive tables are created in the warehouse directory, the location is specified in value for the key
hive.metastore.warehouse.dir on the config file
$HIVE_HOME/conf/hive-site.xml, by default, the warehouse location would be
On this warehouse location, Hive creates a directory for every database you create and a subdirectory for every table, table data is stored in the table directory.
1.1 Create Table optional clauses
Below I have explained the most used optional clauses with examples.
IF NOT EXISTS – You can use IF NOT EXISTS to avoid the error in case the table is already present. Hive checks if the requesting table already presents,
EXTERNAL – Used to create external table
TEMPORARY – Used to create temporary table.
ROW FORMAT – Specifies the format of the row.
FIELDS TERMINATED BY – By default Hive use ^A field separator, To load a file that has a custom field separator like comma, pipe, tab use this option.
PARTITION BY – Used to create partition data. Using this improves performance.
CLUSTERED BY – Dividing the data into a specific number for buckets.
LOCATION – You can specify the custom location where to store the data on HDFS.
Besides these, Hive also supports many optional clauses.
2. Hive Create Table Example
2.1 Create Table
Here is a simple Hive example that creates an employee table in emp database with id, name, age and gender columns. we have also provided comment to the table.
CREATE TABLE IF NOT EXISTS emp.employee ( id int, name string, age int, gender string ) COMMENT 'Employee Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
- Note that the latest versions of Hive don’t support the
CHARdatatype hence used
- By default, Hive uses ^A as a field separator while loading a file. but, I have a comma-separated file to load the data into this table hence, I’ve used
ROW FORMAT DELIMITED FIELDS TERMINATED BYoptional clause to specify the custom delimiter.
2.2 Describe Table
DESCRIBE command to describe the table.
jdbc:hive2://127.0.0.1:10000> DESCRIBE emp.employee;
2.3 Load File into table
Let’s see how to load a data file into the Hive table we just created.
- Create a data file (for our example, I am creating a file with comma-separated fields)
- Upload the data file (data.txt) to HDFS. Note you can also load the data from
LOCALwithout uploading to HDFS.
- Now use the Hive LOAD command to load the file into the table.
LOAD DATA INPATH '/user/hive/data/data.txt' INTO TABLE emp.employee;
Note that after loading the data, the source file is not present in the location, and you can check the loaded file at HDFS.
hdfs dfs -ls /user/hive/warehouse/emp.db/employee/ -rw-r--r-- 1 hive supergroup 52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt
2.4 Select table
SELECT command to select the get the data from a table.
SELECT * FROM emp.employee
2.5 Insert Date into Table
Like SQL, you can also use
INSERT INTO to insert rows into Hive table.
INSERT INTO emp.employee values(7,'scott',23,'M'); INSERT INTO emp.employee values(8,'raman',50,'M');
3. Hive Table Types
3.1 Internal or Managed Table
By default, Hive creates an Internal table also known as the Managed table, In the managed table, Hive owns the data/files on the table meaning any data you insert or load files to the table are managed by the Hive process when you drop the table the underlying data or files are also get deleted.
Examples I have used above sections are Managed tables.
Let’s see another example on how to
LOAD file from
LOCAL file system.
LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee;
Unlike loading from HDFS, source file from
LOCAL file system won’t be removed.
3.2 External Table
EXTERNAL option you can create an external table, Hive doesn’t manage the external table, when you drop an external table, only table metadata from Metastore will be removed but the underlying files will not be removed and still they can be accessed via HDFS commands, Pig, Spark or any other Hadoop compatible tools.
CREATE EXTERNAL TABLE emp.employee_external ( id int, name string, age int, gender string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hive/data/employee_external';
Now, let’s LOAD the data from local file system.
LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;
This will create a folder
/user/hive/data/employee_external/ on HDFS and creates a file
Let’s check if the file created at Hive external location on HDFS.
hdfs dfs -ls /user/hive/data/employee_external/ -rw-r--r-- 1 hive supergroup 52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt
As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.
Let’s see this in action by dropping the table emp.employee_external using
DROP TABLE emp.employee_external command and check if the file still exists by running above
hdfs -ls command.
3.3 Temporary Table
A temporary table is created using
TEMPORARY option, these tables exist only within the current session, upon exiting the session the temporary tables will be removed and cannot be accessed in another session.
There are few limitations to the temporary table
- Cannot Create Partitioned Table
- Indexes are not supported
Below is an example to create
CREATE TEMPORARY TABLE emp.employee_tmp ( id int, name string, age int, gender string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
3.4 Transactional Table
Hive 4.0 supports another type of table called Transactional tables., Transactional Tables have support ACID operations like Insert, Update and Delete operations.
CREATE TRANSACTIONAL TABLE emp.employee_tmp ( id int, name string, age int, gender string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;
4. Create Table From Existing Table
4.1 Create Table As Select (CTAS)
Like RDBMS SQL, HiveQL also supports
CREATE TABLE AS SELECT (CTAS) which is used to create a table from the results of the select query.
CREATE TABLE emp.filter AS SELECT id,name FROM emp.employee WHERE gender = 'F';
CTAS has these restrictions:
- The target table cannot be an external table.
- The target table cannot be a list bucketing table.
4.2 Create Table LIKE
CREATE TABLE LIKE is used to create a table with a structure or definition similar to the existing table without copying the data.
CREATE TABLE emp.similar LIKE emp.employee;
New table emp.similar is created with zero rows and the definition is the same as emp.employee
Hope this helps !!
Happy Learning !!