• 周六. 10 月 5th, 2024




HBase shell filter and its corresponding development and Application

King Wang

1 月 3, 2022
 General operation :
hbase(main)> status
hbase(main)> version
Create a namespace : namespace It means a Logical grouping of tables , Tables in the same group have similar uses , Equivalent to... In a relational database database.
hbase(main):060:0> create_namespace 'test1'
Create a table for this namespace :
hbase(main):061:0> create 'test1:test','f1','f2'
create 'scores',{NAME=>'course',VERSIONS=>2}
1) See which tables are available list describe
hbase(main)> list
hbase(main)> describe 'member'
2) Create table create Just create the column family ,put Data is direct
# grammar :create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
# for example : Create table t1, There are two family name:f1,f2, And the number of versions is 2
hbase(main)> create 't1',{NAME => 'f1', VERSIONS => 2},{NAME => 'f2', VERSIONS => 2}
3) Delete table
A two-step : First disable, then drop
for example : Delete table t1
hbase(main)> disable 't1'
hbase(main)> drop 't1'
4) View the structure of the table
# grammar :describe <table>
# for example : See the table t1 Structure
hbase(main)> describe 't1'
5) Modify table structure alter
To modify the table structure, you must first disable
alter 't1', {NAME => 'f1', VERSIONS => 5}
# grammar :alter 't1', {NAME => 'f1'}, {NAME => 'f2', METHOD => 'delete'}
# for example : Modify table test1 Of cf Of TTL by 180 God
hbase(main)> disable 'test1'
hbase(main)> alter 'test1',{NAME=>'body',TTL=>'15552000'},{NAME=>'meta', TTL=>'15552000'}
hbase(main)> enable 'test1'
6) Add data put
# grammar :put <table>,<rowkey>,<family:column>,<value>,<timestamp>
# for example : Give table t1 Add a row of records :rowkey yes rowkey001,family name:f1,column name:col1,value:value01,timestamp: System default
hbase(main)> put 't1','rowkey001','f1:col1','value01'
The usage is relatively simple .
7) Query data
a) Query a line of records get
# grammar :get <table>,<rowkey>,[<family:column>,....]
Inquire about rowkey001 All column values under a row :
hbase(main)> get 't1','rowkey001'
# for example : Query table t1,rowkey001 That's ok ,f1:col1 Column
hbase(main)> get 't1','rowkey001', 'f1:col1'
# perhaps :
hbase(main)> get 't1','rowkey001', {COLUMN=>'f1:col1'}
b) Scan table
# grammar :scan <table>, {COLUMNS => [ <family:column>,.... ], LIMIT => num}
# in addition , You can also add STARTROW、TIMERANGE and FITLER And other advanced functions
# for example : Scan table t1 Before 5 Data
hbase(main)> scan 't1',{LIMIT=>5}
test1 Under space test Tabular columns=f1 All of the line
hbase(main)> scan 'test1:test',{COLUMNS=>'f1'}
test1 Under space test Tabular columns=f1 Of the 1 That's ok
hbase(main)> scan 'test1:test',{COLUMNS=>'f1',LIMIT=>1}
scan 'scores',{VERSIONS=>2} version<=2
scan 'scores',{TIMERANGE=>[1394097631386,1394097651029],VERSIONS=>2}
c) Number of data rows in the query table count
# grammar :count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}
# INTERVAL Set how many lines to display once and the corresponding rowkey, Default 1000;CACHE The size of the cache to be fetched each time , The default is 10, Adjust this parameter to improve the query speed
# for example , Query table t1 The number of lines in , Every time 100 Bar shows once , The cache area is 500
hbase(main)> count 't1', {INTERVAL => 100, CACHE => 500}
8) Delete data
a ) Delete a column value in a row delete
# grammar :delete <table>, <rowkey>, <family:column> , <timestamp>, Column name... Must be specified
# for example : Delete table t1,rowkey001 Medium f1:col1 The data of
hbase(main)> delete 't1','rowkey001','f1:col1'
notes : Will delete and change line f1:col1 List all versions of data
b ) Delete row deleteall
# grammar :deleteall <table>, <rowkey>, <family:column> , <timestamp>, You can do without specifying the column name , Delete entire row of data
# for example : Delete table t1,rowk001 The data of
hbase(main)> deleteall 't1','rowkey001'
c) Delete all data in the table truncate
# grammar : truncate <table>
# The specific process is :disable table -> drop table -> create table
# for example : Delete table t1 All data for
hbase(main)> truncate 't1'
9) Check if the list exists exists
hbase(main):019:0> exists 't1'
10) See if the table is available is_enabled
hbase(main):036:0> is_enabled 't1'
hbase(main)> create help
11). Judge whether the table enable
hbase(main):034:0>is_enabled 'member'
** filter :**

0. All filters work on the server
1. Show all filters
hbase(main):010:0> show_filters
2. Only return key And other key parts
scan 'airline',{ FILTER => "KeyOnlyFilter()"}
3. Return only the first value of each line
scan 'airline',{ FILTER => "FirstKeyOnlyFilter()"}
4. Filter rowkey Need to enter rowkey The prefix of
scan 'airline', {FILTER => "(PrefixFilter ('row2'))"}
5. Over worry qualifier, Need to enter qualifier Prefix
scan 'airline', {FILTER => "(PrefixFilter ('row2')) AND ColumnPrefixFilter('destination')"}
6. Multiple filtration qualifier, Need to enter qualifier Prefix
scan 'airline',{FILTER =>"MultipleColumnPrefixFilter('source','destination','date')"}
7. Return the corresponding qualifier Count
scan 'airline',{FILTER =>"ColumnCountGetFilter(2)"}
8. How many lines back
scan 'airline',{FILTER => "PageFilter(1)"}
9. Which line of scanning stops
scan 'airline',{FILTER =>"InclusiveStopFilter('row1')"}
10. Only return specified Qualifier The data of
scan 'airline',{ FILTER =>"QualifierFilter(=,'binary:flightno')"} > = < To replace ‘=’
11. Return to meet the conditions ( The value of a column ) The data of
scan 'airline', { COLUMNS =>'flightbetween:source', LIMIT => 4, FILTER => "ValueFilter( =, 'binaryprefix:hyd' )" } > = < To replace ‘=’
The largest version of the line is through HColumnDescriptor Defined in each column family , The default maximum version number is 1
Setting... Is not recommended Maximum version number For great value ( Hundreds or more ), Unless the old data is important to you . Because too many versions will make StoreFile It's big .
hbase org.apache.hadoop.hbase.mapreduce.Driver import apply_info /user/data_temp/apply_info hdfs route hbase surface 
 Create table statement details :
create 'testtable',{NAME => 'Toutiao', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '10', COMPRESSION => 'LZO', TTL => '30000', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
{NAME => 'coulmn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '30', COMPRESSION => 'LZO', TTL => '30000', IN_MEMORY => 'true'}
( The attributes are REPLICATION_SCOPE Number of copies , Contains only one column cluster “Toutiao”,versions: Set the number of historical versions The default is 1,TTL: Expiration time The unit is in seconds , The default is to permanently save ,COMPRESSION: Compression way , When configuring lzo The situation of )
The bloon filter , Optimize HBase The immediate read performance of , Optional value NONE|ROW|ROWCOL, The default is NONE, This parameter can be enabled for a column cluster alone . Enable filter , about get Operation and part scan Operation can eliminate the storage files that will not be used , Reduce the actual IO frequency , Improve random reading performance .Row The type applies only according to Row Search for , and RowCol The type applies according to Row+Col Union search , as follows :
Row The type applies to :get ‘NewsClickFeedback’,’row1′
RowCol The type applies to :get ‘NewsClickFeedback’,’row1′,{COLUMN => ‘Toutiao’}
For businesses with random reads , Recommended Opening Row Type of filter , Use space for time , Improve random reading performance .
Data compression ,HBase Support multiple forms of data compression , On the one hand, reduce data storage space , On the one hand, reduce the data network transmission and improve the reading efficiency . at present HBase There are three kinds of compression algorithms supported :GZip | LZO | Snappy, The table below shows the compression ratio , Three aspects of encoding and decoding rate are compared :
Snappy The compression rate is the lowest , But the encoding and decoding speed is the highest , Yes CPU The consumption is also the smallest , At present, it is generally recommended to use Snappy
Is the data resident in memory , The default is false.HBase Provides a cache area for frequently accessed data , Generally, the amount of data stored in the cache area is small 、 Access frequent data , Common scenarios are metadata storage . By default , The size of the cache area is equal to Jvm Heapsize * 0.2 * 0.25 , If Jvm Heapsize = 70G, The size of the storage area is about equal to 3.2G. It should be noted that HBase Meta Metadata information is stored in this area , If business data is set to true And too often leads to Meta Data is replaced , Results in the performance degradation of the whole cluster , So you need to be very careful when setting this parameter .
Open or not block cache cache , Default on .
Data expiration time , The unit is in seconds , The default is to permanently save . For many businesses , Sometimes it's not necessary to keep some data permanently , Permanent storage will lead to more and more data , Consumption of storage space is one of them , On the other hand, it will reduce the efficiency of query . If the expiration time is set 

HBase Filter introduction

(1) Filter introduction




Column prefix filter


Time stamp filter


Paging filter


Compound column prefix filter


Cluster filter



Single column value filter


Health filter


Column filter



Value filter


Prefix filter


Single row value eliminator






(2) Filter classification



Compare filters

RowFilter、FamilyFilter、QualifierFilter、ValueFilter etc.

Special filters

SingleColumnValueFilter、SingleColumnValueExcludeFilter、PrefixFilter、ColumnPrefixFilter、PageFilter etc.

(3) Operator type
















There are no operators

(4) Comparator type

The comparator



Compare the specified byte array in byte index order , use Bytes.compareTo(byte[])


Same as before , Just compare whether the data on the left is the same


Compare by bit


Compare long type value


Judge a given value Is it empty


Provide a regular comparator , Support only EQUAL and NOT_EQUAL Operator


Determine whether the supplied substring appears in value in

(5) Use the method of comparison filter


Comparison operator CompareFilter.CompareOp Comparison operators are used to define comparison relationships , There are several types of values to choose from :

  • EQUAL equal
  • GREATER Greater than
  • GREATER_OR_EQUAL Greater than or equal to
  • LESS Less than
  • LESS_OR_EQUAL Less than or equal to
  • NOT_EQUAL It’s not equal to

The comparator ByteArrayComparable Through the comparator can achieve a variety of target matching effect , Comparators have the following subclasses to use :

  • BinaryComparator Match full byte array
  • BinaryPrefixComparator Match byte array prefix
  • BitComparator Not commonly used
  • NullComparator Not commonly used
  • RegexStringComparator  Match regular expression
  • SubstringComparator  Match substrings

1. Multiple filters –FilterList(Shell I won’t support it ) FilterList Represents a filter chain , It can contain a set of filters to be applied to the target dataset , The filter room has “ And ” FilterList.Operator.MUST_PASS_ALL and “ or ” FilterList.Operator.MUST_PASS_ONE Relationship .

// Combined filter , Get all age stay 15 To 30 Between the lines
private static void scanFilter() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// And
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
// >=15
SingleColumnValueFilter filter1 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.GREATER_OR_EQUAL, "15".getBytes());
// =<30
SingleColumnValueFilter filter2 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.LESS_OR_EQUAL, "30".getBytes());
Scan scan = new Scan();
// set Filter
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

2. Column value filter –SingleColumnValueFilter Used to test column values for equality (CompareOp.EQUAL ), Unequal (CompareOp.NOT_EQUAL), Or unilateral range ( Such as CompareOp.GREATER). Constructors : 2.1. The comparison keyword is an array of characters (Shell I won’t support it ?) SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)

//SingleColumnValueFilter Example
private static void scanFilter01() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, "18".getBytes());
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

2.2. The keyword of comparison is a comparator ByteArrayComparable SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)

hbase(main):032:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'regexstring:.4')"}
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
3 row(s) in 0.0130 seconds
//SingleColumnValueFilter Example 2 -- RegexStringComparator
private static void scanFilter02() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Regular expression for value comparison -- RegexStringComparator
// matching info:age Value to "4" ending
RegexStringComparator comparator = new RegexStringComparator(".4");
// The fourth parameter is different
SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
//SingleColumnValueFilter Example 2 -- SubstringComparator
private static void scanFilter03() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Detects whether a substring exists in a value ( Case insensitive ) -- SubstringComparator
// Filter age The value contains '4' Of RowKey
SubstringComparator comparator = new SubstringComparator("4");
// The fourth parameter is different
SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):033:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:4')"}
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
3 row(s) in 0.0180 seconds


3. Name filter because HBase Use key value pairs to save internal data , The column name filter filters the column names of a row (ColumnFamily:Qualifiers) Whether there is , Corresponding to the values listed in the previous section .

3.1. be based on Columun Family Column families filter data FamilyFilter FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)

Be careful : 1. If you want to find a known column family , Then use  scan.addFamily(family); It’s more efficient than using filters . 2. Due to the present HBase The support for the multi – ethnic group is not perfect , Therefore, the filter is not widely used at present .

// Filter data based on column family FamilyFilter
private static void scanFilter04() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Filter = 'address' Column family of
//FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryComparator("address".getBytes()));
// Filtering to 'add' The first line
FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryPrefixComparator("add".getBytes()));
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):021:0> scan 'users',{FILTER=>"FamilyFilter(=,'binaryprefix:add')"}
xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
xiaoming column=address:contry, timestamp=1441997498911, value=china
xiaoming column=address:province, timestamp=1441997498939, value=zhejiang
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
zhangyifei column=address:contry, timestamp=1441997499077, value=china
zhangyifei column=address:province, timestamp=1441997499093, value=guangdong
zhangyifei column=address:town, timestamp=1441997500711, value=xianqiao
3 row(s) in 0.0400 seconds


3.2. be based on Qualifier Column names filter data QualifierFilter QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)

explain : The filter should be better than FamilyFilter More commonly used !

// be based on Qualifier( Name ) Filtering data QualifierFilter
private static void scanFilter05() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Filter column names = 'age' all RowKey
//QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("age".getBytes()));
// Filter column names With 'age' start all RowKey( contain age)
//QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryPrefixComparator("age".getBytes()));
// Filter column names contain 'age' all RowKey( contain age)
//QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new SubstringComparator("age"));
// Filter column names accord with '.ge' Regular expressions all RowKey
QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator(".ge"));
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):020:0> scan 'users',{FILTER=>"QualifierFilter(=,'regexstring:.ge')"}
xiaoming column=info:age, timestamp=1441997971945, value=38
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
zhangyifei column=info:age, timestamp=1442247255446, value=18
5 row(s) in 0.0460 seconds


3.3. Filter data based on column name prefix ColumnPrefixFilter( This function uses QualifierFilter Can also be realized ) ColumnPrefixFilter(byte[] prefix) Be careful : A column name can appear in multiple column families , The filter will return the matching columns in all column families .

//ColumnPrefixFilter Example
private static void scanFilter06() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// matching With 'ag' All the columns at the beginning
ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter("ag".getBytes());
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):018:0> scan 'users',{FILTER=>"ColumnPrefixFilter('ag')"}
xiaoming column=info:age, timestamp=1441997971945, value=38
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
zhangyifei column=info:age, timestamp=1442247255446, value=18
5 row(s) in 0.0280 seconds


3.4. Filtering data based on multiple column name prefixes MultipleColumnPrefixFilter MultipleColumnPrefixFilter and ColumnPrefixFilter Act almost , But you can specify multiple prefixes .

//MultipleColumnPrefixFilter Example
private static void scanFilter07() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// matching With 'a' perhaps 'c' start All columns { Two dimensional array }
byte[][] prefixes =new byte[][]{"a".getBytes(), "c".getBytes()};
MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes );
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):017:0> scan 'users',{FILTER=>"MultipleColumnPrefixFilter('a','c')"}
xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
xiaoming column=address:contry, timestamp=1441997498911, value=china
xiaoming column=info:age, timestamp=1441997971945, value=38
xiaoming column=info:company, timestamp=1441997498889, value=alibaba
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
zhangyifei column=address:contry, timestamp=1441997499077, value=china
zhangyifei column=info:age, timestamp=1442247255446, value=18
zhangyifei column=info:company, timestamp=1441997499039, value=alibaba
5 row(s) in 0.0430 seconds


3.5. Based on column range ( It’s not a line ) Filtering data ColumnRangeFilter

  1. Can be used to get a range of columns , for example , If you have a million columns in a row , But you just want to see the column names from bbbb To dddd The scope of the
  2. The method from HBase 0.92 Version introduction
  3. A column name can appear in multiple column families , The filter will return the matching columns in all column families

Constructors : ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive) Parameter interpretation :

  • minColumn – Minimum value of column range , If it is empty , There is no lower limit
  • minColumnInclusive – Does the column range contain minColumn
  • maxColumn – Column range maximum , If it is empty , There is no upper limit
  • maxColumnInclusive – Does the column range contain maxColumn

//ColumnRangeFilter Example
private static void scanFilter08() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// matching With 'a' Start with 'c' start ( It doesn't contain c) All columns
ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter("a".getBytes(), true, "c".getBytes(), false);
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):016:0> scan 'users',{FILTER=>"ColumnRangeFilter('a',true,'c',false)"}
xiaoming column=info:age, timestamp=1441997971945, value=38
xiaoming column=info:birthday, timestamp=1441997498851, value=1987-06-17
xiaoming01 column=info:age, timestamp=1441998917568, value=24
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
zhangyifei column=info:age, timestamp=1442247255446, value=18
zhangyifei column=info:birthday, timestamp=1441997498990, value=1987-4-17
5 row(s) in 0.0340 seconds


4.RowKey When you need to find a range of row data according to row key characteristics , Use Scan Of startRow and stopRow Will be more efficient , however ,startRow and stopRow Only the start character of the line key can be matched , It can’t match the characters in the middle . When more complex filtering is needed for row keys , have access to RowFilter. Constructors :RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)

//RowFilter Example
private static void scanFilter09() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// matching The line key contains '01' All of the line
RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("01"));
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
hbase(main):013:0> scan 'users',{FILTER=>"RowFilter(=,'substring:01')"}
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming01 column=info:age, timestamp=1441998917568, value=24
1 row(s) in 0.0190 seconds


5.PageFilter(Shell I won’t support it ?) Specify the number of lines on the page , Returns the result set of the corresponding number of rows . It should be noted that , The filter does not guarantee that the number of returned result lines is less than or equal to the specified number of page lines , Because the filters act on each one separately region server Of , It can only guarantee the present region The number of returned result lines does not exceed the number of specified page lines . Constructors :PageFilter(long pageSize)

//PageFilter Example
private static void scanFilter10() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// from RowKey by "xiaoming" Start , take 3 That's ok ( contain xiaoming)
PageFilter pageFilter = new PageFilter(3L);
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

Be careful : Because the filter does not guarantee that the number of returned result lines is less than or equal to the specified number of page lines , So the better way to return the specified number of rows is ResultScanner.next(int nbRows), namely :

// above Demo A modified version of
private static void scanFilter11() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// from RowKey by "xiaoming" Start , take 3 That's ok ( contain xiaoming)
//PageFilter pageFilter = new PageFilter(3L);
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
// Specify return 3 Row data
for(Result result : rs.next(3)){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

6.SkipFilter(Shell I won’t support it ) Filter according to each column in the whole row , As long as there is a list of unsatisfied conditions , The whole line is filtered out . Constructors :SkipFilter(Filter filter)

for example , If all the columns in a row represent the weight of different items , In the real world, these values must be greater than zero , We want those containing any column value to be 0 All the lines are filtered out . In this case , We combine ValueFilter and SkipFilter To achieve this goal together : scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes(0))));

//SkipFilter Example
private static void scanFilter12() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Skip column values to include "24" All columns of
SkipFilter skipFilter = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator("24".getBytes())));
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

7.Utility–FirstKeyOnlyFilter The filter simply returns the first of each line cell Value , It can be used for efficient row count operation . It is estimated that the actual combat is not significant . Constructors :public FirstKeyOnlyFilter()

//FirstKeyOnlyFilter Example
private static void scanFilter12() throws IOException,
UnsupportedEncodingException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
conf.set("hbase.zookeeper.quorum", "ncst");
HTable ht = new HTable(conf, "users");
// Return to the first of each line cell Value
FirstKeyOnlyFilter firstKeyOnlyFilter = new FirstKeyOnlyFilter();
Scan scan = new Scan();
ResultScanner rs = ht.getScanner(scan);
int i = 0;
for(Result result : rs){
for(Cell cell : result.rawCells()){
System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
+new String(CellUtil.cloneFamily(cell))+"\t"
+new String(CellUtil.cloneQualifier(cell))+"\t"
+new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
// Output the total number of lines
hbase(main):009:0> scan 'users',{FILTER=>'FirstKeyOnlyFilter()'}
xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
xiaoming02 column=info:age, timestamp=1441998917594, value=24
xiaoming03 column=info:age, timestamp=1441998919607, value=24
zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
5 row(s) in 0.0240 seconds

Common filters API

package com.aura.hbase.test;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.MultipleColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.PageFilter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.filter.QualifierFilter;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.filter.ValueFilter;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.FamilyFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
import com.aura.hbase.utils.HBasePrintUtil;
public class HBaseFilterTest {
public static final String ZOOKEEPER_LIST = "node01:2181,node02:2181,node03:2181";
public static final String TABLE_NAME = "user_info";
public static Configuration conf = null;
public static Admin admin = null;
public static Table table = null;
static {
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", ZOOKEEPER_LIST);
try {
Connection conn = ConnectionFactory.createConnection(conf);
admin = conn.getAdmin();
table = conn.getTable(TableName.valueOf(TABLE_NAME));
} catch (IOException e) {
* Scan the whole table , Search for records of a specified family
public void testScanWithFamily() throws Exception {
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
* Scan the whole table , Search for the designated families 、 Specify the record for the column
public void testScanWithColumn() throws Exception {
Scan scan = new Scan();
scan.addColumn("base_info".getBytes(), "name".getBytes());
ResultScanner scanner = table.getScanner(scan);
* Scan the whole table , Search for records with a specified timestamp or a specified timestamp range
public void testScanWithTimestamp() throws Exception {
Scan scan = new Scan();
// Specify time stamp , Find out one
// scan.setTimeStamp(1514443301587L);
// Specify the timestamp range , Find out one or more
scan.setTimeRange(1514443301340L, 1514443301587L);
ResultScanner scanner = table.getScanner(scan);
* Scan the whole table , Search for designation rowkey or rowkey A record of the scope
public void testScanWithRowkey() throws Exception {
Scan scan = new Scan();
* If only startRow, Just check from startRow Record to end of table ( Not including the last rowkey The line where it is recorded )
* If only stopRow, Just query from the beginning of the table to stopRow The record of ( barring stopRow That line of records )
ResultScanner scanner = table.getScanner(scan);
* test RowFilter
* Scan the whole table , Search for rowkey Less than or equal to "baiyc_20150716_0003" The record of
public void testRowFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new RowFilter(CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("baiyc_20150716_0003")));
ResultScanner scanner = table.getScanner(scan);
* test FamilyFilter
* Scan the whole table , The search column cluster is larger than "base_info" The record of
public void testFamilyFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new FamilyFilter(CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("base_info")));
ResultScanner scanner = table.getScanner(scan);
* test QualifierFilter
* Scan the whole table , Searching for column names equals "name" The record of
public void testQualifierFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("name")));
* BinaryComparator The comparator : Match exactly equivalent column names
* Filter filter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("name")));
* BinaryPrefixComparator The comparator : The prefix of the matching column name is "na" The record of
* Filter filter = new QualifierFilter(CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("na")));
* RegexStringComparator The comparator : The matching column name satisfies the regular expression "na." The record of
* Filter filter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator("na."));
ResultScanner scanner = table.getScanner(scan);
* test ValueFilter
* Scan the whole table , The value of the lookup column contains "mus" The record of substring
public void testValueFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new ValueFilter(CompareOp.EQUAL, new SubstringComparator("mus"));
ResultScanner scanner = table.getScanner(scan);
* test FilterList
* Add multiple filters at the same time
public void testFilterList() throws Exception {
Scan scan = new Scan();
Filter filter1 = new FamilyFilter(CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("base_info")));
Filter filter2 = new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("music")));
FilterList list = new FilterList(filter1, filter2);
ResultScanner scanner = table.getScanner(scan);
* test PageFilter
* Paging filter , From specified rowkey Start , Display the specified number of bars
public void testPageFilter() throws Exception {
Scan scan = new Scan();
// Set each page to show 4 page
Filter filter = new PageFilter(4);
// Set the starting rowkey
ResultScanner scanner = table.getScanner(scan);
* test SingleColumnValueFilter: Single column value filter , It will return the whole line that meets the condition
* Scan the whole table , Query the family as "base_info", Column name is "name", And the column values include "zhangsan" All lines of the substring
public void testSingleColumnValueFilter() throws Exception {
Scan scan = new Scan();
SingleColumnValueFilter filter = new SingleColumnValueFilter(
new SubstringComparator("zhangsan"));
* If not set to true, Then those do not contain the specified column Will return to
* such as , Now there's a line it doesn't have "name" This column , All of its column values do not include "shangsan" This string , Then the line will return
* Set to true, Only those who have "name" This column , And the lines that meet the filter conditions
ResultScanner scanner = table.getScanner(scan);
* test SingleColumnValueExcludeFilter: Single row value eliminator , Returns the result excluding the column
* Same row as the filter above , But don't print "name" That column
public void testSingleColumnValueExcludeFilter() throws Exception {
Scan scan = new Scan();
SingleColumnValueExcludeFilter filter = new SingleColumnValueExcludeFilter(
new SubstringComparator("zhangsan"));
ResultScanner scanner = table.getScanner(scan);
* test PrefixFilter: Prefix filter , For row key
* Scan the whole table : Inquire about rowkey The prefix of is "baiyc" All the lines of
public void testPrefixFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new PrefixFilter(Bytes.toBytes("baiyc"));
ResultScanner scanner = table.getScanner(scan);
* test ColumnPrefixFilter: Column prefix filter
* Scan the whole table : The prefix of the query column name is "na" All the records of
public void testColumnPrefixFilter() throws Exception {
Scan scan = new Scan();
Filter filter = new ColumnPrefixFilter(Bytes.toBytes("na"));
ResultScanner scanner = table.getScanner(scan);
* test MultipleColumnPrefixFilter: Set multiple prefixes to filter data based on column names
* Scan the whole table : The prefix of the query column name is "na" And column names are prefixed with "ag" All the records of
public void testMultipleColumnPrefixFilter() throws Exception {
Scan scan = new Scan();
byte[][] prefixes = new byte[][] {Bytes.toBytes("na"), Bytes.toBytes("ag")};
Filter filter = new MultipleColumnPrefixFilter(prefixes);
ResultScanner scanner = table.getScanner(scan);

