Question

I am trying to build a twitter-like follow mechanism. User takes an action. We make a list of all that users' followers, and then populate all of their streams with some information. Since this can take some time (if you have 10,000 followers that's 10,000 streams to insert information in, ie. 10,000 SQL calls perhaps), I want to make sure that this is done in the background, while the user that takes the action can go on with his life.

So, the strategy I'm considering is this:

  • user takes action.
  • php script opens another php script that will do all the work and that might take a second or two.
  • meanwhile, the user that took the action can go on with their life, their script just goes on and it's fast.

Thoughts? I also played around with using a queue, something like SQS, but this approach sounds like it might also work? Plus it has the advantage (for me) that it's easier to test locally and easier to run on non ec2 hosts.

And if this is a good approach, how would I open a php script from within a php script? Could it be as simple as (if the php script lives at a url) doing a get on a url where that script lives?

Was it helpful?

Solution

The way this is being described sounds like you want to replicate/duplicate the first user's post for everyone who follows that user? That's going to be a data storage nightmare.

You should look at it from the other point of view. Consider the following model:

User A posts what he ate for breakfast. This is stored once in a table with his user id.

User B looks at his "stream". This is a dynamically created list of posts. At this point User B is following 50 people. User B's script will get the first 50 most recent posts of whomever he is following and display them for him on his "stream"

With this model, you never have more than one post per user per frivolous breakfast update. Also, the number of followers does not scale up the processing time needed to publish the twit. I mean tweet.

Clarification

You'll just need to do some normalization. So you'll have a users table, a users_following table, and a posts table. The query would look akin to:

SELECT posts.* FROM users_following
         LEFT JOIN posts ON posts.user_id = users_following.followed
         WHERE users_following.follower = $idOfUserB
         ORDER BY posts.created LIMIT 50;

OTHER TIPS

If you want your site to scale at all.

  • First you need to use a message queue like for example redis/beanstalkd/gearmand.
  • Second you need to do your operations in memory using for example redis/memcached. Make sure you have enough memory to keep active dataset in memory.

(if you have 10,000 followers that's 10,000 streams to insert information in, ie. 10,000 SQL calls perhaps)

10,000 SQL calls has fail whale written over it. I would not use MySQL(Or at least use it with memcached) for such an application but use redis. Keep active dataset in memory. Keep datamodel as simple as possible.

And if this is a good approach, how would I open a php script from within a php script?

Don't do that. Add messages to redis's blocking list via lpush and read them via blpop(daemon process). I would first populate list of online users and next populate list of offline users. Offline users don't mind delay because they aren't online. You would put reference to key on list of all users following that person and get all the keys via mget.

Could it be as simple as (if the php script lives at a url) doing a get on a url where that script lives?

Again don't call urls but use message queue. Calling url will give you undesired overhead.

Awesome. Back to SQL :) This will be fast even if you're following 500 people? –

SQL will give you fail whales big time on high load. At least you will need memcached! But I would use redis instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top