I'm trying to determine the write performance penalty of using column family compression on an HBase table. However, the API for initiating a flush (admin.flush(tableName)) is an asynchronous, non-blocking operation. How can I determine how long it takes to flush a batch of Puts to disk?

Update: I am currently targeting HBase 0.94.2, so http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.2-cdh4.2.0/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush%28String%29 confirms that admin.flush was asynchronous.

有帮助吗?

解决方案

admin.flush(tableNameOrRegionName) is a synchronous operation and not asynchronous. Look here .
You can simply use time command to know the time taken by your PUT operation or in terms of java - use System.nanoTime()

其他提示

By calling flush you are forcing HBase to flush the memstore, which will end up in not optimal performance.

Why can't you just grep Region Servers log to see how long it takes to flush the memstore? cat hbase-regionserver*.log | grep "Finished memstore flush"

you will get the size in MB flushed and the time it took.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top