Syncing objects between two disparate systems, best approach?

https://stackoverflow.com/questions/636675

10-07-2019
|

Question

I am working on syncing two business objects between an iPhone and a Web site using an XML-based payload and would love to solicit some ideas for an optimal routine.

The nature of this question is fairly generic though and I can see it being applicable to a variety of different systems that need to sync business objects between a web entity and a client (desktop, mobile phone, etc.)

The business objects can be edited, deleted, and updated on both sides. Both sides can store the object locally but the sync is only initiated on the iPhone side for disconnected viewing. All objects have an updated_at and created_at timestamp and are backed by an RDBMS on both sides (SQLite on the iPhone side and MySQL on the web... again I don't think this matters much) and the phone does record the last time a sync was attempted. Otherwise, no other data is stored (at the moment).

What algorithm would you use to minimize network chatter between the systems for syncing? How would you handle deletes if "soft-deletes" are not an option? What data model changes would you add to facilite this?

Solution

The simplest approach: when syncing, transfer all records where updated_at >= @last_sync_at. Down side: this approach doesn't tolerate clock skew very well at all.

It is probably safer to keep a version number column that is incremented each time a row is updated (so that clock skew doesn't foul your sync process) and a last-synced version number (so that potentially conflicting changes can be identified). To make this bandwidth-efficient, keep a cache in each database of the last version sent to each replication peer so that only modified rows need to be transmitted. If this is going to be a star topology, the leaves can use a simplified schema where the last synced version is stored in each table.

Some form of soft-deletes are required in order to support sync of deletes, however this can be in the form of a "tombstone" record which contains only the key of the deleted row. Tombstones can only be safely deleted once you are sure that all replicas have processed them, otherwise it is possible for a straggling replica to resurrect a record you thought was deleted.

OTHER TIPS

So I think in summary your questions relate to disconnected synchronization.

So here is what I think should happen:

Initial Sync You retrieve the data and any information associated with it (row versions, file checksums etc). it is important you store this information and leave it pristine until the next succesful sync. Changes should be made on a COPY of this data.

Tracking Changes If you are dealing with database rows, the idea is, you basically have to track insert, update and delete operations. If you are dealing with text files like xml, then its slightly more complicated. If it likely that multiple users will edit this file at the same time, then you would have to have a diff tool, so conflicts can be detected in a more granular level (instead of the whole file).

Checking for conflicts Again if you are just dealing with database rows, conflicts are easy to detect. You can have another column that increments whenever the row is updated (i think mssql has this builtin not sure about mysql). So if the copy you have has a different number than what's on the server, then you have a conflict. For files or strings, a checksum will do the job. I suppose you could also use modified date but make sure that you have a very precise and accurate measurement to prevent misses. for example: lets say I retrieve a file and you save it as soon as I retrieved it. Lets say the time difference is a 1 millisecond. I then make changes to file then I try to save it. If the recorded last modified time is accurate only to 10 milliseconds, there is a good chance that the file I retrieved will have the same modified date as the one you saved so the program thinks theres no conflict and overwrites your changes. So I generally don't use this method just to be on the safe side. On the other hand the chances of a checksum/hash collision after a minor modification is close to none.

Resolving conflicts Now this is the tricky part. If this is an automated process, then you would have to assess the situation and decide whether you want to overwrite the changes, lose your changes or retrieve the data from the server again and attempt to redo the changes. Luckily for you, it seems that there will be human interaction. But its still a lot of pain to code. If you are dealing with database rows, you can check each individual column and compare it against the data in the server and present it to the user. The idea is to present conflicts to the user in a very granular way so as to not overwhelm them. Most conflicts have very small differences in many different places so present it to the user one small difference at a time. So for text files, its almost the same but more a hundred times more complicated. So basically you would have to create or use a diff tool (Text comparison is a whole different subject and is too broad to mention here) that lets you know of the small changes in the file and where they are in a similar fashion as in a database: where text was inserted, deleted or edited. Then present that to the user in the same way. so basically for each small conflict, the user would have to choose whether to discard their changes, overwrite changes in the server or perform a manual edit before sending to the server.

So if you have done things right, the user should be given a list of conflicts if there are any. These conflicts should be granular enough for the user to decide quickly. So for example, the conflict is a spelling change from, it would be easier for the user to choose from word spellings in contrast to giving the user the whole paragraph and telling him that there was a change and that they have to decide what to do, the user would then have to hunt for this small misspelling.

Other considerations: Data Validation - keep in mind that you have to perform validation after resolving conflicts since the data might have changed Text Comparison - like I said, this is a big subject. so google it! Disconnected Synchronization - I think there are a few articles out there.

Source: https://softwareengineering.stackexchange.com/questions/94634/synchronization-web-service-methodologies-or-papers

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow