سؤال

I came across the following line in an article where this internet technology firm talks about how they baked social features into their application:

Apache Thrift, Krati Data Store, JavaEWAH Compressed Bitmaps and JRuby forms the part of our remote service which stores our social graph in high-performing persistent compressed bitmap format.

I am trying to make sense out of this. Till now I have figured out what is meant by Apache Thift (and why it is to be used), JavaEWAH, bit sets, social graph and GUI analysis. Krati Data source does not seem to have a good wiki/tutorial for itself. Furthermore I cannot understand the setup, as to how social graph is being stored and processed using bitsets and the mentioned technology.

If you could explain the same and guide me to relevant resources. Alternatively if you can suggest a better alternative to the stack so described.

هل كانت مفيدة؟

المحلول

Ok, let's put some basics upfront:

I guess, your article is that one: http://www.nextbigwhat.com/technology-implementation-for-social-features-297/

http://en.wikipedia.org/wiki/Social_graph 'The social graph in the Internet context is a graph that depicts personal relations of internet users'

http://thrift.apache.org/ combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

https://github.com/krati/krati Krati is a simple persistent data store with very low latency and high throughput. It is designed for easy integration with read-write-intensive applications with little effort in tuning configuration, performance and JVM garbage collection.

http://code.google.com/p/javaewah/ The bit array data structure is implemented in Java as the BitSet class.... JavaEWAH is a word-aligned compressed variant of the Java bitset class.

http://jruby.org/apidocs/serialized-form.html ....

----- Here's are my interpretations:

The context of the article is technology impementation. So they listed everything. In this context I guess we can ignore apache Thrift for now, as this is just the glue, which they use to attach technologies to each other. Also jrubi forms goes somewhat out of scope for the social graph considerations. Yes a social graph needs input and output, but forms addresses the topic which level of details comes from there.

The interesting part is krati and javaewah. Well reading the article makes obvious, that they implement their social graphs via memberships. This can be about groups or roles or something similar. Memberships can be implemented as bitmap: Have a group with a bitmap with one bit per each user. Each Bit can be addressed to check if the user is member or not. As simple as that. The Bitmaps are made up by Krati and than stored in/managed by JavaEWAH. The cons is: The more users, the bigger does the bitmap go. The Pro: It is FAST.

In relational databases each relation would be implemented as foreign key 2 foreign key pair (which causes some index overhead >eg. 2 ints for the keys and then 2*2+x ints for the double index, whereby the x debends on the database). Especially with lots of memberships per group this can get a disk space utilization challenge. So I guess in such cases the compressed BitMap is implementation is even better in terms of storage utilization.

UPDATE---

One could write books on the whole topic. I guess I need to make a point here. However good starting points from here are:

http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

https://github.com/jingwei/krati/commit/ab1432003e59a07269d23c1cb307625b0e8c5be2

http://en.wikipedia.org/wiki/Data_store http://en.wikipedia.org/wiki/Key-value_store (to get an idea about different database concepts than just the relative one)

http://dev.mysql.com/doc/refman/5.0/en/innodb-physical-record.html (to get some indication what about the costs of a foreign key 2 foreign key relation)

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top