Site icon Spark By {Examples}

HBase Scan to Filter Rows like Where Clause

In this tutorial, you will learn how to use HBase Scan to filter the rows/records from a table using predicate conditions on columns similar to the WHERE clause in SQL. In order to use filters, you need to import certain Java classes into HBase Shell.

First, Let’s print the data we are going to work with using scan. If you don’t have the data, please insert the data to HBase table.

As we have learned in previous chapters, the scan is used to read the data from HBase table.


hbase> scan 'emp'
ROW                         COLUMN+CELL                                                                  
 1                          column=office:age, timestamp=1567542138673, value=20                         
 1                          column=office:name, timestamp=1567541857878, value=Scott                     
 2                          column=office:age, timestamp=1567541901009, value=50                         
 2                          column=office:gender, timestamp=1567541880523, value=M                       
 2                          column=office:name, timestamp=1567541868638, value=Mark                      
 3                          column=office:age, timestamp=1567542149583, value=30                         
 3                          column=office:name, timestamp=1567542103821, value=Jeff                      
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
3 row(s)
Took 0.0823 seconds                                                                                      

SingleColumnValueFilter

In order to filter the rows on the HBase shell using Scan, you need to import the org.apache.hadoop.hbase.filter.SingleColumnValueFilter class along with some other class explained below


hbase> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 
=> [Java::OrgApacheHadoopHbaseFilter::SingleColumnValueFilter]

hbase> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]

hbase> import org.apache.hadoop.hbase.filter.BinaryComparator
=> [Java::OrgApacheHadoopHbaseFilter::BinaryComparator]

Now, let’s run some Filter examples

Example 1: This example returns name == ‘Jeff’ by using CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff'))


hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('name'), CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff')))}
ROW                         COLUMN+CELL                                                                  
 3                          column=office:age, timestamp=1567542149583, value=30                         
 3                          column=office:name, timestamp=1567542103821, value=Jeff                      
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
1 row(s)
Took 0.0480 seconds         

Example 2: Let’s see how to filter age greater than or equal to 50. CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50'))


hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('age'), CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50')))}
ROW                         COLUMN+CELL                                                                  
 2                          column=office:age, timestamp=1567541901009, value=50                         
 2                          column=office:gender, timestamp=1567541880523, value=M                       
 2                          column=office:name, timestamp=1567541868638, value=Mark                      
1 row(s)
Took 0.0180 seconds                                

Example 3: This example check 40000 values on call columns and returns the one that matches.


hbase> scan 'emp', {FILTER => "ValueFilter (=,'binaryprefix:40000')"}
ROW                         COLUMN+CELL                                                                  
 3                          column=office:salary, timestamp=1567542130044, value=40000                   
1 row(s)
Took 0.0034 seconds 

References:

HBase filtering

Exit mobile version