You are currently viewing Hive Built-in String Functions with Examples

Hive supports several built-in string functions similar to SQL functions to manipulate the strings. These Hive string functions come in handy when you are doing transformations without bringing data into Spark and using String functions or any equivalent frameworks.

Advertisements

In this article let’s learn the most used String Functions syntax, usage, description along with examples.

Hive String Functions List

With every new version, Hive has been releasing new String functions to work with Query Language (HiveQL), you can use these built-in functions on Hive Beeline CLI Interface or on HQL queries using different languages and frameworks.

When you need to perform any string manipulations, we often think to write our own code (natural tendency of developers)/UDF however, I would recommend do some research to find out existing functions before you write to UDF as UDF leads to lot of performance issues.

Most of these function returns String data type, functions that return other types I will call it explicitly.

String FunctionUsage & Description
ascii(string str)Returns ASCII numeric value of the first character of the input argument.
Return: int
base64(binary bin)Used to convert the binary input argument to base64 string.
character_length(string str)Returns the number of UTF-8 characters contained in an input string, You can also use alternative char_length(string str).
Return: int

concat(string|binary A, string|binary B…)
Returns the string or bytes after concatenating all strings or bytes passed in as input.
concat_ws(string SEP, string A, string B…)Similar to concat() additionally takes separator SEP. You should use this if you wanted to concatenate the input strings with a delimiter.
decode(binary bin, string charset)Decodes the binary into a string using the charset provided in the second argument.
encode(string src, string charset)Encodes the string into binary type using the character set provided in the second argument.
Return: binary
format_number(number x, int d)Returns the formatted number like ‘#,###,###.##’
instr(string str, string substr)It looks for the first occurrence of substr in str and returns the position. It is based on Index 1 no zero.
Return: int or null when one of the input is null
length(string A)Returns the length of the string.
Return: int
locate(string substr, string str[, int pos])Returns the position of the first occurrence of substr in str after position pos.
Return: int
lower(string A) lcase(string A)Converts the input string into lower case.
lpad(string str, int len, string pad)Returns string after appending string at the left by the length specified. In case of empty pad string, the return value is null.
ltrim(string A)Returns a string after trimming all spaces at the left side of the input string
printf(String format, Obj… args)Returns a formatted input string.
quote(String text)Returns the quoted string (Includes escape character for any single quotes HIVE-4.0.0)
regregexp_extract(string subject, string pattern, int index)It extracts the string matching with regular expression.
regexp_replace()regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
repeat(string str, int n)returns a String after repeating n number of times.
replace(string A, string OLD, string NEW)Replace all occurrences of the old string with the new string on input string A.
reverse(string A)Reverses the input string.
rpad(string str, int len, string pad)Right padding the characters.
rtrim(string A)Returns a string after trimming all spaces at the right side of the input string.
space(int n)Returns a string with n spaces.
split(string str, string pat)Split the input string str by the regular pattern specified.
substr(string|binary A, int start) substring(string|binary A, int start)Extract the substring from the start position.
trim(string A)trimming spaces from left and right.
upper(string A) ucase(string A)Converts the input string into upper case.
initcap(string A)Capitalize the first letter of each word of an input string sentence.
soundex()Returns Soundex code of the input string.
levenshtein(string A, string B)Returns the Levenshtein distance between two strings

Hive String Functions Examples

Below are some of the Hive most using String functions with examples. Hope these are useful.

ascii() function converts char to ascii value


0: jdbc:hive2://> select ascii("ABC"); ==> Returns 65

character_length() returns length of the string


jdbc:hive2://> select character_length("ABC"); ==> Returns 3

concat() function concatenates the multiple strings


jdbc:hive2://> select concat("ABC","DEF"); ==> Returns ABCDEF

concat_ws() function concatenates string with delimiter separator


jdbc:hive2://> select concat_ws("|","1","2","3"); ==> Returns 1|2|3

format_number() function formats integer with comma separator


jdbc:hive2://> select format_number(1234567,3); ==> Returns 1,234,567.000
jdbc:hive2://> select format_number(1234567,0); ==> Returns 1,234,567
jdbc:hive2://> select format_number(1234567.23456,3); ==>  1,234,567.235

instr() searches for a string in a input string and returns the index


jdbc:hive2://> select instr("usa is a land of opportunity","is"); ==> Returns 5

length() function returns the length of the string


jdbc:hive2://> select length("123456789"); ==> Returns 9

locate() Example

similar to instr function but returns the string position found after the specified index


jdbc:hive2://> select locate("is","usa is a usa is a"); ==> Returns 5
jdbc:hive2://> select locate("is","usa is a usa is a",6); ==> Returns 14

lower() returns the string in lower case

lcase() similar to lower() that returns the string in lower case

ltrim() left trim, usually used to trim the spaces on left side of the string.


jdbc:hive2://> select lower("UNITEDSTATES"); ==> unitedstates
jdbc:hive2://> select lcase("UNITEDSTATES"); ==> unitedstates
jdbc:hive2://> select ltrim("    UNITEDSTATES"); ==> UNITEDSTATES

printf() returns the formatted string

To format the output string, sse %s for string, %d for integer number and %f for float.


jdbc:hive2://> select printf('%d UNITED',45); ==> Returns 45 UNITED
jdbc:hive2://> select printf('%S UNITED %d','USA',45); ==> USA UNITED 45 

regregexp_extract() Example


regexp_replace() Example


repeat() the input string n number of times


jdbc:hive2://> select repeat("USA",5); ==> Returns USAUSAUSAUSAUSA

replace() all occurances of a string with a new string


jdbc:hive2://> select replace("ABC CDF ABC CDF","CDF",'XYZ'); Returns ABC XYZ ABC XYZ

reverse() function reverse the string


jdbc:hive2://> select reverse("ABCDEF"); ==> Returns FEDCBA

rpad() right pad is used to add the spaces or characters to input string

If input string is longer than length, the return value is shortened to specified len characters


jdbc:hive2://> select rpad("UNITED",10,'0'); ==> Returns UNITED0000
jdbc:hive2://> select rpad("UNITED",10,' '); ==> Returns 'UNITED    ' // with out quotes
jdbc:hive2://> select rpad("UNITEDSTATES",10,'0'); ==> Returns UNITEDSTAT
jdbc:hive2://> select rpad("UNITEDSTATES",10,null); ==> Returns  NULL

rtrim() eliminates the spaces from the right (right trim ) side of the string. For example, trim(‘ bar ‘) results in ‘ bar’

space() returns a string with the specified number of spaces.


0: jdbc:hive2://> select space(10); ==> Returns '          '

split() the string into an array based on delimiter separator (comma, pipe, space e.t.c)

Below example splits using space delimiter.


jdbc:hive2://> select split("USA IS A PLACE"," ");
Returns: ["USA","IS","A","PLACE"]

substr() | substring() is used to get the part of the string from the starting index and n number of characters.


jdbc:hive2://> select substr("USA IS A PLACE",5,2); ==> Returns IS
jdbc:hive2://> select substr("USA IS A PLACE",5,100); ==> Returns IS A PLACE

trim() – trimming spaces from both ends of a string. For example, trim(‘ bar ‘) results in ‘bar’

This is a result of using ltrim and rtrim functions.

upper() | ucase() changes the string to upper case or all caps.


jdbc:hive2://> select upper("unitedstates"); ==> Returns UNITEDSTATES
jdbc:hive2://> select ucase("unitedstates"); ==> Returns UNITEDSTATES

initcap() returns input string after converting first letter of a word to upper/capital case


jdbc:hive2://> select initcap("USA IS A PLACE");
Returns: Usa Is A Place

soundex() function returns soundex of the string


jdbc:hive2://> select soundex("PLACE"); ==> Returns P420
jdbc:hive2://> select soundex("UNITED"); ==> Returns U533

You May Like Related Articles

Happy Learning !!

Leave a Reply