In this tutorial, you will learn how to use HBase Scan
to filter the rows/records from a table using predicate conditions on columns similar to the WHERE
clause in SQL. In order to use filters, you need to import certain Java classes into HBase Shell.
First, Let’s print the data we are going to work with using scan. If you don’t have the data, please insert the data to HBase table.
As we have learned in previous chapters, the scan is used to read the data from HBase table.
hbase> scan 'emp'
ROW COLUMN+CELL
1 column=office:age, timestamp=1567542138673, value=20
1 column=office:name, timestamp=1567541857878, value=Scott
2 column=office:age, timestamp=1567541901009, value=50
2 column=office:gender, timestamp=1567541880523, value=M
2 column=office:name, timestamp=1567541868638, value=Mark
3 column=office:age, timestamp=1567542149583, value=30
3 column=office:name, timestamp=1567542103821, value=Jeff
3 column=office:salary, timestamp=1567542130044, value=40000
3 row(s)
Took 0.0823 seconds
SingleColumnValueFilter
In order to filter the rows on the HBase shell using Scan, you need to import the org.apache.hadoop.hbase.filter.SingleColumnValueFilter
class along with some other class explained below
hbase> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
=> [Java::OrgApacheHadoopHbaseFilter::SingleColumnValueFilter]
hbase> import org.apache.hadoop.hbase.filter.CompareFilter
=> [Java::OrgApacheHadoopHbaseFilter::CompareFilter]
hbase> import org.apache.hadoop.hbase.filter.BinaryComparator
=> [Java::OrgApacheHadoopHbaseFilter::BinaryComparator]
Now, let’s run some Filter examples
Example 1: This example returns name == ‘Jeff’ by using CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff'))
hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('name'), CompareFilter::CompareOp.valueOf('EQUAL'),BinaryComparator.new(Bytes.toBytes('Jeff')))}
ROW COLUMN+CELL
3 column=office:age, timestamp=1567542149583, value=30
3 column=office:name, timestamp=1567542103821, value=Jeff
3 column=office:salary, timestamp=1567542130044, value=40000
1 row(s)
Took 0.0480 seconds
Example 2: Let’s see how to filter age greater than or equal to 50. CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50'))
hbase> scan 'emp', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('office'), Bytes.toBytes('age'), CompareFilter::CompareOp.valueOf('GREATER_OR_EQUAL'),BinaryComparator.new(Bytes.toBytes('50')))}
ROW COLUMN+CELL
2 column=office:age, timestamp=1567541901009, value=50
2 column=office:gender, timestamp=1567541880523, value=M
2 column=office:name, timestamp=1567541868638, value=Mark
1 row(s)
Took 0.0180 seconds
Example 3: This example check 40000 values on call columns and returns the one that matches.
hbase> scan 'emp', {FILTER => "ValueFilter (=,'binaryprefix:40000')"}
ROW COLUMN+CELL
3 column=office:salary, timestamp=1567542130044, value=40000
1 row(s)
Took 0.0034 seconds