Question

I've created a graph model for a social network and needed some concrete advice regarding the design in regards to scaling. Pardon the n00bness of these questions but I'm not finding very many clear examples out there...

NOTE: the status updates and activity nodes /relationships are linked lists - with the newest entries constantly being placed at the top of the list.

  1. Linked lists allow for news feed generation, but there could be hundreds of records per user - I presume the limit clause isn't sufficient even though the data is in descending order by date. Do I have to have a separate linked list that would only hold the most recent 10 status/activity updates) and constantly replace the head on that list to get better activity feed generation, or will one list properly sorted and do the job (with a limit clause)

  2. These nodes all have properties (json data with content, IDs, etc) - how do "global" indexes come into play here so that I can find, for example, users that like Depeche Mode without waiting a lifetime for results? I know how to add a node to an index, just wondering if I'm missing a part of the picture here..

  3. Security - logins and passwords.. I would presume a graph database could store them, but I'd presume it's a security risk at this point - would it be better to keep this in postgres etc?

  4. How would you improve this model to handle scalability? Imagine 20 million users banging away on this..

  5. Imagine 40 million users - what's wrong with this model when it comes to scalability?

enter image description here

Was it helpful?

Solution

Part 1.

You can write cypher or gremlin queries that do what you want. Remember that you can traverse forwards and backwards on edges. Given a user, it should always be relatively constant time to pull up the last ten things they did.

Part 2.

If you are representing a band as an entity of a certain type, index on that attribute. Then you'll be able to pull out that node and traverse outwards to find all the users who like that band. If you don't have an independent entity, or it is somehow implicit, you'll want to enable full text search for your respective graph database.

Part 3.

Learn more about security. The only thing you would be storing would be a properly hashed string of the user's password. At that point you would be fine using any graph db and good security practices.

Part 4/5.

Once you have one user, worry about the next thousand.

When you have a thousand users, worry about the next hundred thousand.

When you have one hundred thousand, worry about the next million.

When you have a million users, you can start worrying about the questions you asked.

Until you have at least 0.1% of the users/volume you want to scale to, it's mental masturbation to try and ask questions about how to scale up to a certain size.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top