You are currently viewing How to Set Variables in HIVE Scripts

Hive variables are key-value pairs that can be set using the set command and they can be used in scripts and Hive SQL. The values of the variables in Hive scripts are substituted during the query construct.

In this article, I will explain Hive variables, how to create and set values to the variables and use them on Hive QL and scripts, and finally passing them through the command line.

Table of Contents

1. What are the Hive variables?

When working with Hive QL and scripts we often required to use specific values for each environment, and hard-coding these values on code is not a good practice as the values changes for each environment.

To overcome this tight coupling of environment-specific values within the Hive QL script code, we can externalize these by creating variables and setting values outside of the scripts.

These variables are similar to Unix variables.

You can set these variables on Hive CLI (older version), Beeline, and Hive scripts.

Note that when you set values to variables they are local to the active Hive session and these values are not visible to other sessions.

2. Create and Set Hive variables

Hive stores variables in four different namespaces, namespace is a way to separate variables.

  • hiveconf
  • hivevar
  • system, and
  • env. 

2.1 Hive Config Variables (hiveconf)

hiveconf is the default namespace, if you don’t provide a namespace at the time of setting a variable, it will store your variable in hiveconf namespace by default. hiveconf namespace also contains several Hive default configuration variables.

Below examples set’s emp to table variable in hiveconf namespace.


0: jdbc:hive2://127.0.0.1:10000>SET table='emp';

In order to retrieve the variable from hiveconf, you have to explicitly specify the namespace. Not specifying namespace returns an error.


0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hiveconf:table}
# Below statement returns an error
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}

2.2 Hive Custom Variables (hivevar)

Hive version 0.8.0 introduced a new namespace hivevar to set the custom variables (JIRA HIVE-2020), this separates custom variables from Hive default config variables. Hive recommends using hivevar explicitly for custom variables.

Below example sets emp value to table variable in hivevar namespace.


0: jdbc:hive2://127.0.0.1:10000>SET hivevar:table='emp';

In order to retrieve values from hivevar namespace, you can either specify hivevar namespace or ignore it as hivevar is a default namespace for retrieval.


0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}
(or)
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table}

2.3 Hive System Variables (system)

Hive default provides certain system variables and all system variables can be accessed in Hive using system namespace. For example.


system:java.version
system:user.timezone
system:user.home

2.4 Hive Environment Variables (env)

Hive also default provides certain environment variables and all environment variables can be accessed in Hive using env namespace. For example


env:PWD
env:USER

3. Set Variables in Hive Scripts

Hive scripts supports using all variables explained above, you can use any of these along with thier namespace.

let’s create test.hql file with the below content. If you notice, I am refering the table name from hivevar namespace.


0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table};

Execute the test.hql script by running the below command.


bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger -f test.hql
(or)
bin/hive -f test.hql

Prior to executing the queries mentioned in the script, hive replaces the variables with the actual values from namespace during the query construct stage.

4. How to set and pass command line values to Hive scripts?

Hive also supports setting a variable from the command line when starting a Hive CLI or beeline.


bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='emp'
bin/hive --hivevar table='emp'

You can refer this variable in Hive CLI.


0: jdbc:hive2://127.0.0.1:10000>SELECT & FROM ${hivevar:table};

You can also call test.hql script by setting command line variables.


bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='dept' -f test.hql
(or)
bin/hive --hivevar table='dept' -f test.hql

This should select data from dept table.

5. List all Hive variables

From Hive scripts you can access environment (env), system, hive configuration and custom variables. since there are many variables it’s hard to remember all of them, use set to list on variables in hive terminal


0: jdbc:hive2://127.0.0.1:10000>set;
(or)
0: jdbc:hive2://127.0.0.1:10000>set -v; #This lists all variables with namespace

This set command list all available variables and configurations in Hive.

6. How variable substitute works?

Hive substitutes the value for a variable when a query is constructed with the variable.

  • Values on Hive variables are visible to only to active seesion where it’s been assign and they cannot be accessible from another session.

7. Disabling Variable Substitution

By default Hive substitutes all variables, you can disable these using (hive.variable.substitute=true) in case if you wanted to run a script without substitution variables.


0: jdbc:hive2://127.0.0.1:10000>set hive.variable.substitute=false;

Hope it helps!!

Happy Learning !!

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply