質問

It's a simple question with apparently a multitude of answers.

Findings have ranged anywhere from:

a. 22 bytes as per Basho's documentation: http://docs.basho.com/riak/latest/references/appendices/Bitcask-Capacity-Planning/

b. 450~ bytes over here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-August/005178.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004292.html

c. And anecdotal records that state overheads anywhere in the range of 45 to 200 bytes.

Why isn't there a straight answer to this? I understand it's an intricate problem - one of the mailing list entries above makes it clear! - but is even coming up with a consistent ballpark so difficult? Why isn't Basho's documentation clear about this?


I have another set of problems related to how I am to structure my logic based on the key overhead (storing lots of small values versus "collecting" them in larger structures), but I guess that's another question.

役に立ちましたか?

解決

The static overhead is stated on our capacity planner as 22 bytes because that's the size of the C struct. As noted on that page, the capacity planner is simply providing a rough estimate for sizing.

The old post on the mailing list by Nico you link to is probably the best complete accounting of bitcask internals you will find and is accurate. Figuring in the 8bytes for a pointer to the entry and the 13bytes of erlang overhead on the bucket/key pair you arrive at 43 bytes on a 64 bit system.

As for there not being a straight answer ... actually asking us (via email, the mailing list, IRC, carrier pigeon, etc) will always produce an actual answer.

他のヒント

Bitcask requires all keys to be held in memory. As far as I can see the overhead referenced in a) is the one to be used when estimating the total amount of RAM bitcask will require across the cluster due to this requirement.

When writing data to disk, Riak stores the actual value together with various metadata, e.g. the vector clock. The post mentioning 450 bytes listed in b) appears to be an estimate of the storage overhead on disk and would therefore probably apply also to other backends.

Nico's post seems to contain a good and accurate explanation.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top