I don't know how are you trying to implement your solution: if as a Nutch plugin, a Hadoop MapReduce or a single process script, but I guess this information will be helpful:
As indicated in nutch-src/conf/gora-hbase-mapping.xml, batchId is mapped to HBase's column
f:bid
.You have to read it using Gora. The instances of
WebPage
have the method#getBatchId()
. Check the avro WebPage definition and the compiled class.
When developing a plugin, much probably you will see a WebPage
parameter in the interface of the plugin.
If you want to access batchId
in a raw way in HBase, just read the column f:bid
and consider it the raw text. If I am not wrong, Gora is not writing additional information on strings (unlike when serialized).