Question

Let's say you're composing a blogging website. It displays recent blog posts by multiple authors sorted by "priority". Highest priority on top. Priority is determined by some formula involving:

  1. how recently the post was published
  2. how many comments it attracted

Order must always be accurate in real-time.

Sorting by priority is easy. The problem is let's say our website is hugely popular and comments fly in at the hundreds-per-minute rate. They fly in on dozens of posts.

Is there a pattern to handle this scenario? In other words, can we do any better than just updating the priority field whenever there's a comment on a post, and then sorting posts each and every time the page is loaded? Caching post order doesn't help much because heavy user activity causes order to change frequently.

With "pattern" I'm speaking from both a code and database schema point of view.

Était-ce utile?

La solution

You can use a balanced binary tree (e.g. a red-black tree) to store the sorted index, which should make it quicker to update than if you were sorting the entire index every time.

Using Java-ish pseudocode this would look like

Tree tree;

Node {
    int priority;

    incrementPriority() {
        priority = priority + 1;
        if(priority > tree.nextHighestNode(this)) {
            tree.remove(this);
            tree.add(this);
        }
    }

    decrementPriority() {
        priority = priority - 1;
        if(priority < tree.nextLowestNode(this)) {
            tree.remove(this);
            tree.add(this);
        }
    }
}

If changing a node's priority means that it's in an invalid tree location (meaning that it is higher than what ought to be the next-highest node, or lower than what ought to be the next-lowest node), then it's removed and re-added to the tree (which takes care of rebalancing itself). Insertion is O(log(n)), but usually (when there's no insertions/removals) updating the priority is a constant time operation.

Red-black trees are how balanced binary trees are usually implemented, but there are alternatives e.g. a Tango tree is probably more appropriate here since it's an online implementation. The biggest problem is going to be with concurrency - ideally you would want to be able to implement the nodes' priority fields using some sort of AtomicInteger (permits atomic increments and decrements; quite a few languages have something like this) so that you won't need to lock the field each time you change it, but it will be difficult to atomically compare the priority to the adjacent nodes' priorities.

As an alternative, you can store everything in an array or a linked list and swap adjacent elements when their priorities change - this way you won't need to do a full sort each time, and unlike the balanced binary tree where removing and inserting an element is O(log(n)), swapping two adjacent array/list elements is constant time. The only problem is that adding an entirely new element will be costly with an array as you will need to shift all of the array's elements; it will be O(n) with a list as well because you'll need to traverse the list until you find the correct location to insert the item, but this is probably preferable to the array because you won't need to shift any adjacent elements (which will reduce the amount of locking you need to do).

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top