• Post author:
  • Post category:Apache Hive
  • Post last modified:January 21, 2023
  • Reading time:6 mins read
You are currently viewing Where Does Hive Stores Data Files in HDFS?

Hive stores data at the HDFS location /user/hive/warehouse folder if not specified a folder using the LOCATION clause while creating a table. Hive is a data warehouse database for Hadoop, all database and table data files are stored at HDFS location /user/hive/warehouse by default, you can also store the Hive data warehouse files either in a custom location on HDFS, S3, or any other Hadoop compatible file systems.

When you are working with Hive, you need to know about 2 different data stores.

  • Hive Metastore
  • Hive Data warehouse Location (Where Actual table data stored)

Hive Metastore location

Hive Metastore is used to store the metadata about the database and tables and by default, it uses the Derby database; You can change this to any RDBMS database like MySQL and Postgress e.t.c

By default the metastore database name is metastore_db.


metastore_db

Where Do Hive Stores Tables in HDFS?

Hive stores tables files by default at /user/hive/warehouse location on HDFS file system. You need to create these directories on HDFS before you use Hive.

On this location, you can find the directories for all databases you create and subdirectories with the table name you use.

While creating Hive tables, you can also specify the custom location where to store.

Get Storage Path From Property

In case you have a different location, you can get the path from hive.metastore.warehouse.dir property and this can be get by running the following command from a Hive Beeline CLI terminal.


set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/user/hive/warehouse

Get Storage Path From Configuration File

You can also get the path by looking value for hive.metastore.warehouse.dir property on $HIVE_HOME/conf/hive-site.xml file.

Using Hive Command

You can also get the hive storage path for a table by running the below command.


hive -S -e "DESCRIBE FORMATTED table_name;" | grep 'Location' | awk '{ print $NF }'

Table location can also get by running SHOW CREATE TABLE command from hive terminal.


SHOW CREATE TABLE table_name;
(or)
DESCRIBE FORMATTED table_name;

Hive Table Partition Location

If you have a partitioned table on Hive and the location of each partition file is different, you can get each partition file location from HDFS using the below command.


DESCRIBE FORMATTED db_name.table_name PARTITION (name=value)

Conclusion

In this article, you have learned where hive stores the table files and different ways to get the Hive data warehouse location on HDFS. You can get the data warehouse location from the property, config files, and commands.

Reference

Happy Learning !!

Prabha

Prabha is an accomplished data engineer with a wealth of experience in architecting, developing, and optimizing data pipelines and infrastructure. With a strong foundation in software engineering and a deep understanding of data systems, Prabha excels in building scalable solutions that handle diverse and large datasets efficiently. At SparkbyExamples.com Prabha writes her experience in Spark, PySpark, Python and Pandas.

Leave a Reply