Domanda

I'm trying to determine the write performance penalty of using column family compression on an HBase table. However, the API for initiating a flush (admin.flush(tableName)) is an asynchronous, non-blocking operation. How can I determine how long it takes to flush a batch of Puts to disk?

Update: I am currently targeting HBase 0.94.2, so http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.2-cdh4.2.0/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush%28String%29 confirms that admin.flush was asynchronous.

È stato utile?

Soluzione

admin.flush(tableNameOrRegionName) is a synchronous operation and not asynchronous. Look here .
You can simply use time command to know the time taken by your PUT operation or in terms of java - use System.nanoTime()

Altri suggerimenti

By calling flush you are forcing HBase to flush the memstore, which will end up in not optimal performance.

Why can't you just grep Region Servers log to see how long it takes to flush the memstore? cat hbase-regionserver*.log | grep "Finished memstore flush"

you will get the size in MB flushed and the time it took.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top