Hive comes with a set of collection functions to work with Map and Array data types. These functions are used to find the size of the array, map types, get all map keys, values, sort array, and finding if an element exists in an array.
Related:
Hive Collection Functions List
Collection Functions | Usage & Description |
---|---|
size(Map<K.V>) | Returns length of the map type. Return: int |
size(Array<T>) | Returns length of the array type .columnReturn: int |
map_keys(Map<K.V>) | Returns all keys from the map. Return: array<K> |
map_values(Map<K.V>) | Returns all values from the map. Return: array<V> |
array_contains(Array<T>, value) | Returns TRUE if the array contains a value. Return: boolean |
sort_array(Array<T>) | Sorts the input array in ascending order. Return: array<t> |
Hive Collection Functions Examples
Before we jump into Hive collection functions examples, let’s create a Hive table with Array and Map types.
CREATE TABLE IF NOT EXISTS collection_example (
id int,
languages list,
properties map
)
COMMENT 'This is Hive collection Example'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':';
On this table, both array and map collections values are separated by pipe (|) <em>COLLECTION ITEMS TERMINATED BY '|'</em>
and map key and values are separated by colon (:) MAP KEYS TERMINATED BY ':'
Now, create a collection_types.csv file with below contents. To make it simple I will use local file system instead of HDFS.
1,Java|C#|Python,age:21|gender:M
2,PHP|Java|Scala,age:22|gender:F
3,PHP,age:40|gender:F
4,Java|Go|SAS,age:34|gender:M
5,,age:50
Load this file into our table using below command.
jdbc:hive2://> LOAD DATA LOCAL INPATH '/tmp/collection_types.csv'
. . . . . . > INTO TABLE collection_example;
jdbc:hive2://>SELECT * FROM collection_example; // Returns below data
size(Map<K.V>)
Hive size() function is used to get the length of the map type column. This returns the number of elements in the map type.
Return: int
jdbc:hive2://> SELECT properties,size(properties) as size
. . . . . . > FROM collection_example;
OK
+----------------------------+------+
| properties | size |
+----------------------------+------+
| {"age":"21","gender":"M"} | 2 |
| {"age":"22","gender":"F"} | 2 |
| {"age":"40","gender":"F"} | 2 |
| {"age":"34","gender":"M"} | 2 |
| {"age":"50"} | 1 |
+----------------------------+------+
5 rows selected (0.238 seconds)
0: jdbc:hive2://>
All of the rows on table contains 2 map elements and last record has just one.
size(Array<T>)
Hive size() has another alternative that takes an array type column as an argument and returns the number of elements(length) in the array type.
Return: int
jdbc:hive2://> SELECT languages,size(languages) as size
. . . . . . > FROM collection_example;
OK
+-------------------------+------+
| languages | size |
+-------------------------+------+
| ["Java","C#","Python"] | 3 |
| ["PHP","Java","Scala"] | 3 |
| ["PHP"] | 1 |
| ["Java","Go","SAS"] | 3 |
| [] | 0 |
+-------------------------+------+
5 rows selected (0.258 seconds)
map_keys(Map<K.V>)
map_keys() function returns an array containing all keys of the map column.
Return: array<K>
jdbc:hive2://> SELECT properties,map_keys(properties) as map_keys
. . . . . . > FROM collection_example;
OK
+----------------------------+-------------------+
| properties | map_keys |
+----------------------------+-------------------+
| {"age":"21","gender":"M"} | ["age","gender"] |
| {"age":"22","gender":"F"} | ["age","gender"] |
| {"age":"40","gender":"F"} | ["age","gender"] |
| {"age":"34","gender":"M"} | ["age","gender"] |
| {"age":"50"} | ["age"] |
+----------------------------+-------------------+
5 rows selected (0.273 seconds)
0: jdbc:hive2://>
map_values(Map<K.V>)
map_values() function returns an array containing all values of the map column.
Return: array<V>
jdbc:hive2://> SELECT properties,map_values(properties) as map_values
. . . . . . > FROM collection_example;
OK
+----------------------------+-------------+
| properties | map_values |
+----------------------------+-------------+
| {"age":"21","gender":"M"} | ["21","M"] |
| {"age":"22","gender":"F"} | ["22","F"] |
| {"age":"40","gender":"F"} | ["40","F"] |
| {"age":"34","gender":"M"} | ["34","M"] |
| {"age":"50"} | ["50"] |
+----------------------------+-------------+
5 rows selected (0.193 seconds)
0: jdbc:hive2://>
array_contains(Array<T>, value)
array_contains() function returns TRUE if the array column contains a value.
Return: boolean
jdbc:hive2://> SELECT languages,array_contains(languages,'Java')
. . . . . . > FROM collection_example;
OK
+-------------------------+--------+
| languages | _c1 |
+-------------------------+--------+
| ["Java","C#","Python"] | true |
| ["PHP","Java","Scala"] | true |
| ["PHP"] | false |
| ["Java","Go","SAS"] | true |
| [] | false |
+-------------------------+--------+
5 rows selected (0.359 seconds)
0: jdbc:hive2://>
sort_array(Array<T>)
sort_array() function sorts the input array column in ascending order.
Return: array<t>
jdbc:hive2://> SELECT sort_array(languages)
. . . . . . > FROM collection_example;
OK
+-------------------------+
| _c0 |
+-------------------------+
| ["C#","Java","Python"] |
| ["Java","PHP","Scala"] |
| ["PHP"] |
| ["Go","Java","SAS"] |
| [] |
+-------------------------+
5 rows selected (0.232 seconds)
0: jdbc:hive2://>
Happy Learning !!