سؤال

I'm trying to determine the write performance penalty of using column family compression on an HBase table. However, the API for initiating a flush (admin.flush(tableName)) is an asynchronous, non-blocking operation. How can I determine how long it takes to flush a batch of Puts to disk?

Update: I am currently targeting HBase 0.94.2, so http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.2-cdh4.2.0/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush%28String%29 confirms that admin.flush was asynchronous.

هل كانت مفيدة؟

المحلول

admin.flush(tableNameOrRegionName) is a synchronous operation and not asynchronous. Look here .
You can simply use time command to know the time taken by your PUT operation or in terms of java - use System.nanoTime()

نصائح أخرى

By calling flush you are forcing HBase to flush the memstore, which will end up in not optimal performance.

Why can't you just grep Region Servers log to see how long it takes to flush the memstore? cat hbase-regionserver*.log | grep "Finished memstore flush"

you will get the size in MB flushed and the time it took.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top