Pergunta

I assume that each person on Facebook is represented as a node (of a Graph) in Facebook, and relationship/friendship between each person(node) is represented as an edge between the involved nodes.

Given that there are millions of people on Facebook, how is the Graph stored?

Foi útil?

Solução

Strange as it sounds, graphs and graph databases are typically implemented as linked lists. As alluded to here, even the most popular/performant graph database out there (neo4j), is secretly using something akin to a doubly-linked list.

Representing a graph this way has a number of significant benefits, but also a few drawbacks. Firstly, representing a graph this way means that you can do edge-based insertions in near-constant time. Secondly, this means that traversing the graph can happen extremely quickly, if we're only looking to either step up or down a linked list.

The biggest drawback of this though comes from something sometimes called The Justin Bieber Effect, where nodes with a large number of connections tend to be extremely slow to evaluate. Imagine having to traverse a million semi-redundant links every time someone linked to Justin Bieber.

I know that the awesome folks over at Neo4j are working on the second problem, but I'm not sure how they're going about it, or how much success they've had.

Outras dicas

Having worked with Facebook data a bit (harvested from Facebook users) we stored it just as a pair of values: USER_ID, FRIEND_USER_ID.

But I guess your questions is a bit deeper? You can store it in different ways, depending on your research question. One interesting option is triads for example - http://mypersonality.org/wiki/doku.php?id=list_of_variables_available#triads

When I worked with social network data, we stoted the "friendship" relation in a database in the table Friends(friend_a, friend_b, ...) with a B-Tree index on (friend_a, friend_b) plus also some partitioning.

In our case it was a little bit different since the graph was directed, so it wasn't really "friendship", but rather "following/follower" relationship. But for friendship I would just store two edges: both (friend_a, friend_b) and (friend_b, friend_a)

We used MySQL to store the data, if it matters, but I guess it shouldn't.

Licenciado em: CC-BY-SA com atribuição
scroll top