HBase Scan to Filter Rows like Where Clause

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:HBase
Post last modified:March 27, 2024
Reading time:4 mins read

In this tutorial, you will learn how to use HBase Scan to filter the rows/records from a table using predicate conditions on columns similar to the WHERE clause in SQL. In order to use filters, you need to import certain Java classes into HBase Shell.

Advertisements

First, Let’s print the data we are going to work with using scan. If you don’t have the data, please insert the data to HBase table.

As we have learned in previous chapters, the scan is used to read the data from HBase table.


hbase> scan 'emp'
ROW                         COLUMN+CELL                                                                  
 1                          column=office:age, timestamp=1567542138673, value=20                         
 1                          column=office:name, timestamp=1567541857878, value=Scott                     
 2                          column=office:age, timestamp=1567541901009, value=50                         
 2                          column=office:gender, timestamp=1567541880523, value=M                       
 2                          column=office:name, timestamp=1567541868638, value=Mark                      
 3                          column=office:age, timestamp=1567542149583, value=30                         
 3                          column=office:name, timestamp=1567542103821, value=Jeff                      
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
3 row(s)
Took 0.0823 seconds

SingleColumnValueFilter

In order to filter the rows on the HBase shell using Scan, you need to import the org.apache.hadoop.hbase.filter.SingleColumnValueFilter class along with some other class explained below


hbase> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 
=> [Java::OrgApacheHadoopHbaseFilter::SingleColumnValueFilter]

hbase> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]

hbase> import org.apache.hadoop.hbase.filter.BinaryComparator
=> [Java::OrgApacheHadoopHbaseFilter::BinaryComparator]

Now, let’s run some Filter examples

Example 1: This example returns name == ‘Jeff’ by using CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff'))


hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('name'), CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff')))}
ROW                         COLUMN+CELL                                                                  
 3                          column=office:age, timestamp=1567542149583, value=30                         
 3                          column=office:name, timestamp=1567542103821, value=Jeff                      
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
1 row(s)
Took 0.0480 seconds

Example 2: Let’s see how to filter age greater than or equal to 50. CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50'))


hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('age'), CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50')))}
ROW                         COLUMN+CELL                                                                  
 2                          column=office:age, timestamp=1567541901009, value=50                         
 2                          column=office:gender, timestamp=1567541880523, value=M                       
 2                          column=office:name, timestamp=1567541868638, value=Mark                      
1 row(s)
Took 0.0180 seconds

Example 3: This example check 40000 values on call columns and returns the one that matches.


hbase> scan 'emp', {FILTER => "ValueFilter (=,'binaryprefix:40000')"}
ROW                         COLUMN+CELL                                                                  
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
1 row(s)
Took 0.0034 seconds

References:

HBase filtering

SingleColumnValueFilter

Related Articles

References: