Question

How do you sync data between two processes (say client and server) in real time over network?

I have various documents/datasets constructed on the server, which are downloaded and displayed by clients. Once downloaded, the document receives continuous updates in order to remain fresh.

It seems to be a simple and commonly occurring concept, but I cannot find any tools that provide this level of abstraction. I am not even sure what I am looking for. Perhaps there is a similar concept with solid tool support? Perhaps there is a chain of different tools that must be put together? Here's what I have considered so far:

  • I am required to propagate every change in a single hop (0.5 RTT), which rules out polling (typically >10 RTT) and cache invalidation techniques (1.5 RTT).
  • Data replication and simple notification broadcasts are not an option, because there is too much data and too many changes. Clients must be able to select specific documents to download and monitor for changes.
  • I am currently using message passing pattern, which does the job, but it is hopelessly unproductive. It works at way too low level of abstraction. It is laborious, error-prone, and it doesn't scale well with increasing application complexity.
  • HTTP and other RPC-like techniques are good for the initial fetch, but they encourage polling for subsequent synchronization. When performing reverse requests (from data source to data consumer), change notifications are possible, but it's even more complicated than message passing.
  • Combining RPC (for the initial fetch) with message passing (for updates) turned out to be a nightmare due to the complexity involved in coordinating communication over the two parallel connections as well as due to the impedance mismatch between the two paradigms. I need something unified.
  • WebSocket & Comet are popular methods to implement change notification, but they need additional libraries to be productive and I am not aware of any libraries suitable for my application.
  • Message queues merely put an intermediary on the network while maintaining the basic message passing pattern. Custom message filters/routers allow me to get closer to the live document concept, but I feel like I am implementing custom middleware layer on top of the MQ.

I have tons of additional requirements (native observable data structure API on both ends, incremental updates, custom message filters, custom connection routing, cross-platform, robustness & scalability), but before considering those requirements, I need to find some tools that at least attempt to do what I need. I am trying to avoid in-house frameworks for the standard reasons - cost, time to market, long-term maintenance, and keeping developers happy.

Was it helpful?

Solution

My conclusion at the moment is that there is no such live document synchronization framework. In-house solution is the way to go, but many existing components can be used as part of the solution.

It is pretty simple to layer live document logic on top of WebSocket or any other message passing platform. Server just sends the document as a separate message when the connection is initiated and then after every change. Automated reconnection and some connection monitoring must be added to handle network failures.

Serialization at both ends is a separate problem targeted by many existing libraries. Detecting changes in server-side data structures (needed to initiate push) is yet another separate problem that has its own set of patterns and tools. Incremental updates and many other issues can be solved by intermediaries intercepting the connection.

This approach will work with current technology at the cost of extensive in-house glue code. It can be incrementally substituted with standard components as they become available.

WebSocket already includes resource URIs, routing, and a few other nice features. Useful intermediaries and libraries will likely emerge in the future. HTTP with text/event-stream MIME type is a possible future alternative to WebSocket. The advantage of HTTP is that existing tools can be reused with little modification.

I've completely thrown away the pattern of combining RPC pull with separate push channel despite rich tool support. Pushing everything in 0.5 RTT requires the push channel to use exactly the same technology as the pull channel, i.e. reverse RPC. Reverse RPC is like message passing except it introduces redundant returns, throws away useful connection semantics, and makes it hard to insert content-agnostic intermediaries into the stream.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top