Hive variables are key-value pairs that can be set using the set command and they can be used in scripts and Hive SQL. The values of the variables in Hive scripts are substituted during the query construct.
In this article, I will explain Hive variables, how to create and set values to the variables and use them on Hive QL and scripts, and finally passing them through the command line.
Table of Contents
- What are the Hive variables
- Create and Set Hive variables
- Set Variables in Hive Scripts
- Command-line values to Hive scripts
- List all Hive variables
- How variable substitute works
- Disabling Variable Substitution
1. What are the Hive variables?
When working with Hive QL and scripts we often required to use specific values for each environment, and hard-coding these values on code is not a good practice as the values changes for each environment.
To overcome this tight coupling of environment-specific values within the Hive QL script code, we can externalize these by creating variables and setting values outside of the scripts.
These variables are similar to Unix variables.
You can set these variables on Hive CLI (older version), Beeline, and Hive scripts.
Note that when you set values to variables they are local to the active Hive session and these values are not visible to other sessions.
2. Create and Set Hive variables
Hive stores variables in four different namespaces, namespace is a way to separate variables.
- hiveconf
- hivevar
- system, and
- env.
2.1 Hive Config Variables (hiveconf)
hiveconf is the default namespace, if you don’t provide a namespace at the time of setting a variable, it will store your variable in hiveconf namespace by default. hiveconf namespace also contains several Hive default configuration variables.
Below examples set’s emp to table variable in hiveconf namespace.
0: jdbc:hive2://127.0.0.1:10000>SET table='emp';
In order to retrieve the variable from hiveconf, you have to explicitly specify the namespace. Not specifying namespace returns an error.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hiveconf:table}
# Below statement returns an error
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}
2.2 Hive Custom Variables (hivevar)
Hive version 0.8.0 introduced a new namespace hivevar
to set the custom variables (JIRA HIVE-2020), this separates custom variables from Hive default config variables. Hive recommends using hivevar explicitly for custom variables.
Below example sets emp value to table variable in hivevar
namespace.
0: jdbc:hive2://127.0.0.1:10000>SET hivevar:table='emp';
In order to retrieve values from hivevar
namespace, you can either specify hivevar namespace or ignore it as hivevar is a default namespace for retrieval.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}
(or)
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table}
2.3 Hive System Variables (system)
Hive default provides certain system variables and all system variables can be accessed in Hive using system
namespace. For example.
system:java.version
system:user.timezone
system:user.home
2.4 Hive Environment Variables (env)
Hive also default provides certain environment variables and all environment variables can be accessed in Hive using env
namespace. For example
env:PWD
env:USER
3. Set Variables in Hive Scripts
Hive scripts supports using all variables explained above, you can use any of these along with thier namespace.
let’s create test.hql file with the below content. If you notice, I am refering the table name from hivevar
namespace.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table};
Execute the test.hql script by running the below command.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger -f test.hql
(or)
bin/hive -f test.hql
Prior to executing the queries mentioned in the script, hive replaces the variables with the actual values from namespace during the query construct stage.
4. How to set and pass command line values to Hive scripts?
Hive also supports setting a variable from the command line when starting a Hive CLI or beeline.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='emp'
bin/hive --hivevar table='emp'
You can refer this variable in Hive CLI.
0: jdbc:hive2://127.0.0.1:10000>SELECT & FROM ${hivevar:table};
You can also call test.hql script by setting command line variables.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='dept' -f test.hql
(or)
bin/hive --hivevar table='dept' -f test.hql
This should select data from dept table.
5. List all Hive variables
From Hive scripts you can access environment (env), system, hive configuration and custom variables. since there are many variables it’s hard to remember all of them, use set to list on variables in hive terminal
0: jdbc:hive2://127.0.0.1:10000>set;
(or)
0: jdbc:hive2://127.0.0.1:10000>set -v; #This lists all variables with namespace
This set command list all available variables and configurations in Hive.
6. How variable substitute works?
Hive substitutes the value for a variable when a query is constructed with the variable.
- Values on Hive variables are visible to only to active seesion where it’s been assign and they cannot be accessible from another session.
7. Disabling Variable Substitution
By default Hive substitutes all variables, you can disable these using (hive.variable.substitute=true) in case if you wanted to run a script without substitution variables.
0: jdbc:hive2://127.0.0.1:10000>set hive.variable.substitute=false;
Hope it helps!!
Happy Learning !!
Related Articles
- Start HiveServer2 and connect using Beeline
- Where does Hive stores its data in HDFS
- Using Hive Connection String to access from Java
- Connect to Hive using JDBC connection
- Apache Hive Installation on Ubuntu
- How to Update or Drop a Hive Partition?
- Hive – Start HiveServer2 and Beeline