Question

I'm looking for a solution to an edge case scenario where a client continually asking the server for what's new will fail.

In this example, I'm not using timestamps because of another edge case problem. That's handled in this question: A Client Walks Into a Server And Asks "What's New?" – Problems With Timestamps

Assume we're using sequence numbers. There's a single sequence number that is atomically updated every time the table is changed. When any row is updated it records the current sequence. Then it's just a matter of the client asking for what's new since the last sequence it asked for. Simple? Yes, but...

Failure scenario:

Sequence starts at 1
1) Client A starts update. Updates sequence to 2
2) Client B starts update. Updates sequence to 3
3) Client B updates rows with sequence 3
4) Client C requests changes >1.  Gets B's changes. Good.
5) Client A updates rows with sequence 2. 
6) Client C requests changes >3.  Gets nothing. Doesn’t get Client A’s changes.

Because we're using a MongoDB, I don't believe we can easily lock the sequence during an update. And, if we could, I worry about performance.

Having clients ask "what's new" repeatedly seems like a common use case and I find it surprising not to find a better wealth of best practices on this.

Any ideas on solving this scenario or recommending a better, preferably platform agnostic, solution for asking for changes?

Was it helpful?

Solution

One thing you can do is maintain a heap of sequence numbers that are in use as well as the "next sequence number to assign". Try the following:

  • When you grab a sequence number, put it in an "in-use" map.
  • When you are done making changes with that sequence number, remove it from the "in-use" std::set
  • Keep track of the minimum in the set. Whenever the minimum changes from "x" to "y", have Client C request values from x to y, but no greater than y.

So, in your example, when you update the sequence to 2, 1 is put in the in-use set. Then, when you update to 3, 2 is put in there, and the set contains 1 and 2. When the work for 2 is done, 2 is removed from the set, but client C does not pick up any changes because the minimum, 1, is unchanged. When client A is done with 1, the minumum changes from 1 to 3, and client C may read the changes from 1 to 3.

For a more complex example, suppose you have 6 clients use sequence numbers 11, 12, 13, 14, 15, and 16, yet finish in the following order: 12, 13, 11, 15, 14, 16 (which is the order they are removed from the "in-use" set. In this example, after 11 is gone, client C may read 11 through 13, because the minumum changes from 11 to 14. Then, after 14 is gone, client C may read 14 and 15, as the minimum changes from 14 to 16. Then, when 16 is gone, client C can read 16.

This is basically the algorithm we use in TokuMX replication that decides what oplog entries may be replicated to secondaries. Clients A and B would be threads doing writes to the oplog and Client C would be a tailable cursor from the secondary pulling oplog data.

OTHER TIPS

This is clarifying and providing some examples of Zardosht Kasheff's correct answer above.

  • At the start of an update we add the sequence number to an in-use list
  • Any client requesting “what’s new” can’t ask for anything >= the lowest number on the list.
  • Thus if a couple clients were updating with sequences 10, 11, 12, a client would ask for everything from > sequenceOfLastItemReceivedByClient and < 10 (the lowest number on the in use list.)

Here's my original example solved:

Sequence starts at 1
Client A starts update. Uses sequence 2.  In use table now has “2"
Client B starts update. Uses sequence 3. In use table now has “2,3"
Client B updates rows with sequence 3. In use table now has “2"
Client C wants to request >1 .  However, min in-use table is 2.  So it requests >1 and <2 which is none.
Client A updates rows with sequence 2. In use table now empty.
Client C wants to request changes >1. Gets A and B.

And Zardosht's example

Clients start updates with sequences 11, 12, 13, 14, 15, and 16. In use table is "11, 12, 13, 14, 15, 16”
Updates 12, 13, 11 finish.  In use table is “14, 15, 16”
Client C wants to request >1.  Is allowed >1 and <14.  Gets sequences 11, 12, 13
Updates 15, 14, 16 finish.  In use table is empty.
Client C requests >13. Gets 15, 14, 16.

And a final example:

Clients start updates with sequences 11, 12, 13, 14, 15, and 16. In use table is "11, 12, 13, 14, 15, 16”
Updates 12, 16, 11 finish.  In use table is “13, 14, 15”
Client C wants to request >1.  Is allowed >1 and <13.  Gets sequences 11, 12
Updates 13, 14, 15 finish.  In use table is empty.
Client C requests >12. Gets 13, 14, 15, 16

All above examples pass. This seems to be a workable solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top