Will HBase scan over rows that do not contain families that I want returned

https://stackoverflow.com/questions/23277853

09-07-2023
|

質問

I'm using HBase CDH3, and I'm designing my HBase table. Let's say all my rowkeys are hashed, and I have 2 column families colFamA and colFamB. For each row, there will be values stored in either colFamA or colFamB, but not both.

If I set up a scanner to scan over every row, and I specify in my scanner

Scan scan = new Scan();    
scan.addFamily(Bytes.toBytes("colFamA");
hTable.getScanner(scan);

so I only want colFamA values, and not colFamB values, will my scanner still have to scan over rows that contain no data for colFamA (i.e. rows with only colFamB values)? Will the fact that there is colFamB slow down this scan even though I'm not adding it as a column to be returned in my scan?

解決

One word answer is NO.

Slightly longer answer is: HBase does not process unneeded families during scanning at all. Every family is actually stored into different storage so it is obvious there is no need to search something into not specified family. If no family is specified, all families are scanned.

Even more detailed explanation: at lease AFAIK for HBase 0.96 I see there is RegionScanner interface and RegionScannerImpl class which is member of HRegion. This scanner constructor checks if families are specified into your Scan object and additional scanners list is determined based on families array (per store).

他のヒント

The data in Hbase is stored in the regions and there will be only 1 column family per region. when ever you are scanning one column family, the scanner will only read data for that column family in the Hfiles in the regions related to that column family only. It won't read data from remaining column families.

It will only return values in colFamA

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow