Site icon Spark By {Examples}

Where Does Hive Stores Data Files in HDFS?

hive store hdfs location

Hive stores data at the HDFS location /user/hive/warehouse folder if not specified a folder using the LOCATION clause while creating a table. Hive is a data warehouse database for Hadoop, all database and table data files are stored at HDFS location /user/hive/warehouse by default, you can also store the Hive data warehouse files either in a custom location on HDFS, S3, or any other Hadoop compatible file systems.

When you are working with Hive, you need to know about 2 different data stores.

Hive Metastore location

Hive Metastore is used to store the metadata about the database and tables and by default, it uses the Derby database; You can change this to any RDBMS database like MySQL and Postgress e.t.c

By default the metastore database name is metastore_db.


metastore_db

Where Do Hive Stores Tables in HDFS?

Hive stores tables files by default at /user/hive/warehouse location on HDFS file system. You need to create these directories on HDFS before you use Hive.

On this location, you can find the directories for all databases you create and subdirectories with the table name you use.

While creating Hive tables, you can also specify the custom location where to store.

Get Storage Path From Property

In case you have a different location, you can get the path from hive.metastore.warehouse.dir property and this can be get by running the following command from a Hive Beeline CLI terminal.


set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/user/hive/warehouse

Get Storage Path From Configuration File

You can also get the path by looking value for hive.metastore.warehouse.dir property on $HIVE_HOME/conf/hive-site.xml file.

Using Hive Command

You can also get the hive storage path for a table by running the below command.


hive -S -e "DESCRIBE FORMATTED table_name;" | grep 'Location' | awk '{ print $NF }'

Table location can also get by running SHOW CREATE TABLE command from hive terminal.


SHOW CREATE TABLE table_name;
(or)
DESCRIBE FORMATTED table_name;

Hive Table Partition Location

If you have a partitioned table on Hive and the location of each partition file is different, you can get each partition file location from HDFS using the below command.


DESCRIBE FORMATTED db_name.table_name PARTITION (name=value)

Conclusion

In this article, you have learned where hive stores the table files and different ways to get the Hive data warehouse location on HDFS. You can get the data warehouse location from the property, config files, and commands.

Reference

Happy Learning !!

Exit mobile version