How to ensure message transfer by pull-/polling from a webserver

https://stackoverflow.com/questions/4030433

26-09-2019
|

Question

This builds on How to send messages between Companies. If I decide that company S(upplier) should poll orders from company (B) in some simple HTTP based Way what is the best implementation.

I assume Company B has a Webserver running and the backend database of this Webserver is durable. We should make as few a possible assumptions about the storage processes at S and if they are able to keep state (e.g. a list of already transmitted GUIDs)
The Internet connection between B and S is unreliable.
We have to reach eventual consistency meaning at one point in time all orders between B and S should be transferred.

What is the best practice to implement such a system?

Solution

One approach to this kind of problem is to use some kind of queueing product, as an IBM person I immediately consider MQ. However as I'm not actually MQ person myself, like you I would probably be happy with service based approach you are taking.

There are two possible approaches that come to mind. One is to use the WS Reliable Messaging, which pushes the reliability problem down into the Web service infrastructure. The other is to hand-crank your own reliable protocol on top of simple, but unreliable, services.

I've not got serious practical experience of implementing a system with WS Reliable Messaging, I do believe that it can be made to work, but it does require some degree of control over the participants - as it's a comparatively recent standard we can't guarantee that any given IT shop will have an implementation to hand, and interoperability between vendors may be an issue. The more control I have over the SW stacks at each end the more inclined I would be to use WS Reliable Messaging. [I should mention WS Atomic Transaction too, which also can be used to build realiable services, the same inter-op concerns apply.]

So what about roll-your-own? The key here is to make all services idempotent. As we don't have transactional guarantees that span the two systems we must assume that any given service call may fail with unknown outcome.

I'm going to assume that B wants to have confirmation that S has taken an order, therefore we need to update information at both B and S when an order is transferred.

B must offer services such as these:

 Give me the next order(s)

 I have stored {orders ...}

So how do we define "next". The simplest case works nicely if the volumes we are dealing with can allow us to have a single "thread" of transfer. Then B is ticking off the sent orders one at a time, and the orders have a monotonically increasing ID. We can then simplify to:

 I have stored order <65,004> please give me the next

note that this is an idempotent request: it can safely be repeated many times. Also note that S must anticipate the possibility of getting the same order twice, and check for duplicates.

OTHER TIPS

What you are probably looking for is two phase commit. It is well described in internet, here for example:

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

The gist of it:

The commit process proceeds as follows:

* Phase 1
      o Each participating resource manager coordinates local 
        operations and forces all log records out:
      o If successful, respond "OK"
      o If unsuccessful, either allow a time-out or respond "OOPS" 
* Phase 2
      o If all participants respond "OK":
            * Coordinator instructs participating resource managers to "COMMIT"
            * Participants complete operation writing the log record
              for the commit 
      o Otherwise:
            * Coordinator instructs participating resource managers to "ROLLBACK"
            * Participants complete their respective local undos

Should work for any kind of data.

Okay, first of all you can't guarantee anything over an unreliable link. The Two Generals' Problem proves this for both deterministic and nondeterministic protocols. All you can do is mitigate the unreliability to an acceptable degree.

The easiest method is, in your case, once the server receives a poll request, it sends x number of replies, all with the same GUID. For example.

S: B, anything new?
S: B, anything new?
S: B, anything new?
B: Yes, S, I need a jacket (order #123).
S: B, anything new?
B: Yes, S, I need a jacket (order #123).
S: B, anything new?
B: Yes, S, I need a jacket (order #123).
S: B, anything new?
B: Yes, S, I need a jacket (order #123).
B: Yes, S, I need some shoes (order #124).
S: B, anything new?
B: Yes, S, I need a jacket (order #123).
B: Yes, S, I need some shoes (order #124).
S: B, anything new?
B: Yes, S, I need some shoes (order #124).
S: B, anything new?
B: Yes, S, I need some shoes (order #124).
...

S may get spammed with orders, but since the # is sent with every request, it's not a big deal. If we missed it before, we're getting it now. If we didn't get it before, woohoo! We have it now. The system works! You'll notice that B sends messages 5 times in my example. In a realistic scenario you would probably send a message hundreds or thousands of times, until you have the desired reliability.

Now the above solution is processing and bandwidth intensive, but it does work. A more clever method is to do what TCP does: have a three-way handshake.

S: Hello B. Are you there? -> SYN
B: Hello S, yep I'm here. What's up? -> SYN+ACK
S: Oh good, you're there. -> ACK
S: B, anything new?
B: Yes, S, I need a jacket (order #123).

But.. HTTP already does this. So if something doesn't get somewhere, you'll know. Connection timed out, connection broke, etc.

Now, you could re-write these scenarios within the application level (enter WS-ReliableMessaging), but really TCP is already reliable. Some critics of these SOAP(ish) frameworks and faux-protocols (they work on top of HTTP usually) accuse them of essentially reinventing the wheel - and the wheel's problems - on a higher level of abstraction.

The bottom line is that any system can fail, including reliable messaging systems.

As far as eventual consistency is concerned, I think you may be confused. Eventual consistency only applies to distributed storage systems where after a Write(), you may not be able to deterministically retrieve it with a Read() for some time. This doesn't seem like your problem at all. I mean, I see what you're saying, but in a eventually consistent system, a reliable (enough) connection is assumed between the nodes. You don't make that assumption (even though I think you should .. TCP is pretty darn reliable).

Building on what , dina mentioned. Webservices would be a perfect solution to the above problem. Some protocol can be agreed upon which would define the number of records.

S ---> D ( Call a service which would list record keys)
D----> S ( provide xml of keys)
S----> D ( Request each of the records)
D----> S ( Submit record)

In case of a new record entry being made after the synch, Destination can invoke a service deployed at the source, which would handle a new record.

Since communication is handled buy a web service engine , you need not worry about the message parameters. SSL can be added for security.

Cheers!

I think you are trying to say that Company B is a passive participant. S (supplier) just needs the ability to get all of the orders that B posts (eventual consistency). But B doesn't need or care about what orders S already has (no need for commit).

If company B has an semi-accurate clock, you can use date as the monotonically increasing GUID, depending on resolution of events -- you won't want to poll if you need millisecond resolution anyway. You only use B's clock so you don't have to worry about synchronization. If B publishes all orders, S can just pick up the orders from where it last left off.

I'm not sure if you meant best practices or best tradeoffs for an easy to implement system. Depending on volume and response time, there is no need to make it a dynamic system if you are polling anyway. Dump orders as text files (named by timestamp) into a directory named by the date and pull them all down (or selectively). You can even store them in directories by the hour or whatever makes sense. HTTP GET is idempotent.

That might be ugly, but it sounds like you don't expect much complexity from company B. Use SSL and auth and it is locked down and encrypted.

If you don't need performance there is nothing wrong with simple. What do you really gain from a complicated protocol?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow