You are currently viewing Hive Create Table Syntax & Usage with Examples

Hive CREATE TABLE statement is used to create a table, it is similar to creating a table in RDBMS using SQL syntax, additionally, Hive has many more features to work with files.

Advertisements

In this article, I will explain Hive CREATE TABLE usage and syntax, different types of tables Hive supports, where Hive stores table data in HDFS, how to change the default location, how to load the data from files to Hive table, and finally using partitions.

Table of Contents

1. Hive Create Table Syntax

By using CREATE TABLE statement you can create a table in Hive, It is similar to SQL and CREATE TABLE statement takes multiple optional clauses,


CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [column_constraint] [COMMENT col_comment], ...)]
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
[CLUSTERED BY (col_name, col_name,.......]
[COMMENT table_comment]
[ROW FORMAT row_format]
[FIELDS TERMINATED BY char]
[LINES TERMINATED BY char]
[LOCATION 'hdfs_path']
[STORED AS file_format]

In Hive, table can be created with or without the database, If you wanted to create in a database, specify database name qualifier.

Note: By default, the Hive tables are created in the warehouse directory, the location is specified in value for the key hive.metastore.warehouse.dir on the config file $HIVE_HOME/conf/hive-site.xml, by default, the warehouse location would be /user/hive/warehouse.

On this warehouse location, Hive creates a directory for every database you create and a subdirectory for every table, table data is stored in the table directory.

1.1 Create Table optional clauses

Below I have explained the most used optional clauses with examples.

IF NOT EXISTS – You can use IF NOT EXISTS to avoid the error in case the table is already present. Hive checks if the requesting table already presents,

EXTERNAL – Used to create external table

TEMPORARY – Used to create temporary table.

ROW FORMAT – Specifies the format of the row.

FIELDS TERMINATED BY – By default Hive use ^A field separator, To load a file that has a custom field separator like comma, pipe, tab use this option.

PARTITION BY – Used to create partition data. Using this improves performance.

CLUSTERED BY – Dividing the data into a specific number for buckets.

LOCATION – You can specify the custom location where to store the data on HDFS.

Besides these, Hive also supports many optional clauses.

2. Hive Create Table Example

2.1 Create Table

Here is a simple Hive example that creates an employee table in emp database with id, name, age and gender columns. we have also provided comment to the table.


CREATE TABLE IF NOT EXISTS emp.employee (
 id int,
 name string,
 age int,
 gender string )
 COMMENT 'Employee Table'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ',';
  • Note that the latest versions of Hive don’t support the CHAR datatype hence used string for gender.
  • By default, Hive uses ^A as a field separator while loading a file. but, I have a comma-separated file to load the data into this table hence, I’ve used ROW FORMAT DELIMITED FIELDS TERMINATED BY optional clause to specify the custom delimiter.

2.2 Describe Table

Use DESCRIBE command to describe the table.


jdbc:hive2://127.0.0.1:10000> DESCRIBE emp.employee;
Hive Create Table

2.3 Load File into table

Let’s see how to load a data file into the Hive table we just created.

  • Create a data file (for our example, I am creating a file with comma-separated fields)
  • Upload the data file (data.txt) to HDFS. Note you can also load the data from LOCAL without uploading to HDFS.
  • Now use the Hive LOAD command to load the file into the table.

LOAD DATA INPATH '/user/hive/data/data.txt' INTO TABLE emp.employee;

Note that after loading the data, the source file is not present in the location, and you can check the loaded file at HDFS.


hdfs dfs -ls /user/hive/warehouse/emp.db/employee/
-rw-r--r--   1 hive supergroup         52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt

2.4 Select table

Use SELECT command to select the get the data from a table.


SELECT * FROM emp.employee
hive create table

2.5 Insert Date into Table

Like SQL, you can also use INSERT INTO to insert rows into Hive table.


INSERT INTO emp.employee values(7,'scott',23,'M');
INSERT INTO emp.employee values(8,'raman',50,'M');

3. Hive Table Types

3.1 Internal or Managed Table

By default, Hive creates an Internal table also known as the Managed table, In the managed table, Hive owns the data/files on the table meaning any data you insert or load files to the table are managed by the Hive process when you drop the table the underlying data or files are also get deleted.

Examples I have used above sections are Managed tables.

Let’s see another example on how to LOAD file from LOCAL file system.


LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee;

Unlike loading from HDFS, source file from LOCAL file system won’t be removed.

3.2 External Table

Using EXTERNAL option you can create an external table, Hive doesn’t manage the external table, when you drop an external table, only table metadata from Metastore will be removed but the underlying files will not be removed and still they can be accessed via HDFS commands, Pig, Spark or any other Hadoop compatible tools.


CREATE EXTERNAL TABLE emp.employee_external (
 id int,
 name string,
 age int,
 gender string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LOCATION '/user/hive/data/employee_external';

Now, let’s LOAD the data from local file system.


LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;

This will create a folder /user/hive/data/employee_external/ on HDFS and creates a file data.txt.

Let’s check if the file created at Hive external location on HDFS.


hdfs dfs -ls /user/hive/data/employee_external/
-rw-r--r--   1 hive supergroup         52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt

As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.

Let’s see this in action by dropping the table emp.employee_external using DROP TABLE emp.employee_external command and check if the file still exists by running above hdfs -ls command.

3.3 Temporary Table

A temporary table is created using TEMPORARY option, these tables exist only within the current session, upon exiting the session the temporary tables will be removed and cannot be accessed in another session.

There are few limitations to the temporary table

Below is an example to create TEMPORARY table


CREATE TEMPORARY TABLE emp.employee_tmp (
 id int,
 name string,
 age int,
 gender string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ',';

3.4 Transactional Table

Hive 4.0 supports another type of table called a Transactional table, Transactional Tables have support ACID operations like Insert, Update and Delete operations.


CREATE TRANSACTIONAL TABLE emp.employee_tmp (
 id int,
 name string,
 age int,
 gender string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 STORED AS ORC;

4. Create Table From Existing Table

4.1 Create Table As Select (CTAS)

Like RDBMS SQL, HiveQL also supports CREATE TABLE AS SELECT (CTAS) which is used to create a table from the results of the select query.


CREATE TABLE emp.filter AS SELECT id,name FROM emp.employee WHERE gender = 'F';

CTAS has these restrictions:

  • The target table cannot be an external table.
  • The target table cannot be a list bucketing table.

4.2 Create Table LIKE

CREATE TABLE LIKE is used to create a table with a structure or definition similar to the existing table without copying the data.


CREATE TABLE emp.similar LIKE emp.employee;

New table emp.similar is created with zero rows and the definition is the same as emp.employee

Hope this helps !!

Happy Learning !!

Leave a Reply

This Post Has One Comment

  1. Anonymous

    Thank you for this content