How to merge user data after login?

Question 1

An excellent question that any app developer using user data should ask, and, sadly very few do :(

In fact, there are two completely independent questions here:

Q1 - At what stage require user to sign in/up?
Q2 - Data concurrency and conflict resolution (see below).

And here some analysis for each of the questions. Please excuse my extra passion coming from my own "frustrated user" experience. :)

Q1 is a pure usability question. To which the answer is actually obvious:

Avoid or delay to force the user sign in as much as possible!

Even the need to save state is not enough a reason by itself. If I am as user not interested in saving that state, then don't force me to sign! Please!

The only reason for you (as website) to justify forcing me to sign is when I (as user) want to save my data for later use. Here I speak as user having wasted time on signing only to find the site useless. If you want to get rid of too many users, that is the right way. In any other case - please, delay it as much as possible!

Why so many sites completely disregard such an obvious rule? The possible reasons I can see:

R1- developer friendly vs user friendly. Yes, it is developer friendly to require sign in right away, so we don't need to bother with concurrency (Q2). So we can save developer costs, time etc. But every saving comes at a cost! Which in this case is called User Experience. Which is not necessarily where you would like to look for saving. Especially, since the solution should not be that hard (see below).
R2 - Designer or Manager making the decision is an "indoor enthusiast" :) She lives happy life surrounded by super-fast computers with super-fast internet connection and can't imagine singing up can be that hard for any user. So why is it such a big deal? So many reasons:
It breaks the application flow. Sites living in previous century still replace the whole screen with sometimes rather lengthy form. Some forms are badly designed, some have erratic instructions, some simply don't work. Some have submit buttons that are for some reason disabled in the browser used. Some form designers have genius idea to lock certain fields with barely noticeable change or colour. Then don't show me the field if you don't want me to fill it!
If the site is serious about user's data, it must request Email and must verity it! Why? How else shall I get back to user who forgot all other credentials? Why verify? What if user mistyped the email? If I don't verify it, next time the user tries to recover password with her correct email, the recovery fails and all data are lost! Obvious, yet there are still sites out there not doing it. Then I need to wait till the verification email is received and click on, hopefully, well-formatted and uniquely identifiable link that does not break in my browser, nor get some funny characters due to broken encoding detection, making the whole link unusable.
The internet connection can be slow or broken, making every additional step a piece of pain. Even with good connection, it happens here and there that page suddenly takes much longer to load. Also the email may not arrive right away. Then impatient user starts furiously clicking the "resend verification" link. In which case 90% of sites resend their link with new token but also disable all previous tokens. Then several emails arrive in unpredictable order and poor user has to guess in vain, which one (and only one) is still valid. Now why those sites find it so hard to keep several tokens active, just for this case, is beyond my understanding.
Finally there is still this so hard to unlearn habit for sites to insist on the so-called "username". So now, beside my email, I have to think hard to come up with this unique username, different from any previous user! Thank you so much for making it sweet and easy! My own way of dealing with it is to use my email as username. Sadly, there are still sites not accepting it! Then what if some fun type used my email as his username? Not so unrealistic if your email is bill@gates.com. But why simply not use Email and Password to avoid all this mess?

Here some possible guidelines to relieve user's pain:

Only force me to sign in/up if you absolutely need and give me a chance to choose not to!
Make it one page form, so I know what I am up to and, needless to say, use as few input fields as possible. Ideally only Email and Password (possibly twice), no Username!
Show your sign in form as small window on top of your page without reloading, and allow me to get rid of it with single click away from that window. Don't force me to look for "close" button or, even worse, icon I could confuse for something else!
Account for user to click back/forth and reload buttons. Don't clear the form upon reload! Don't put clear button at all! It is too easy to click by accident. The data you are ask me to fill should not be so long in first place, that I could not re-enter it without the need of "assistance" to clear.

Now to question Q2. Here we have well known problem of conflict resolution that occurs any time two data need to be merged. For instance, the anonymous data and the registered user data, but also whenever two users modify the same data, or the same user modifies it from different devices at different times, or locally stored data conflict with server data, and so on.

But whatever the source is, the problem is always the same. We have two data, say two objects $obj1 and $obj2 and you need to produce your single merged object $obj3. The logic can be as simple as the rule that server's object always wins, or that the last modified object always wins, or the last modified object keys always win or any more complicated logic. This really depends on the nature of your application. But in each case, all you need to do is to write your logic function with arguments $obj1, $obj2 that returns $obj3.

A solution that will possibly work in many cases is to store timestamp on each object attribute (key) and let the latest changed key win at the moment of synchronisation. That accounts e.g. for the situation when the same user modifies different attributes when being anonymous from different devices.

Imagine I had modified keys A and B on device AA yesterday, then logged today from device BB to enter another B and saved it to the server, then switched back to my device AA, where I am anonymous, to enter yet another A without changing the old B from yesterday, and then realised I want to log in and synchronise. Then my local B is obviously old and should clearly not overwrite the value of B that I changed more recently on device BB. In this seemingly complicated case, the above solutions works seamlessly and effectively. In contrast, putting the timestamp only on whole objects would be wrong.

Now in some cases, it could make sense to keep both objects, and, e.g. distinguish them by adding extra properties, like in case 1 suggested in Radek's question. For instance, Dropbox adds something like "conflicted copy by user X" to the end of the file. Like in Dropbox case, this is sensible in case of collaboration apps, where users like to have some version control. However, in those cases, you as developer simply save two copies and let the users deal with that problem.

If on the other hand, you have to write a complicated logic based on user's data, having two different copies hanging around can be a nightmare. In that case, I would split data into two groups (e.g. create two objects out of one). The first group has data representing the state of the application as a whole, that is important to be unique. For that data I would use the conflict resolution as above or similar. Then the second group is user-specific, where I would store both data as two separate entries in the database, properly mark them (like Dropbox does), and then let users deal with the list of two (or more) entries of their project.

Finally, if that additional complication of database management makes the developer uneasy, and since Radek asked to give a resource reference, I want to "kill two flies with one shot" by mentioning the blog entry StackMob offline Sync, whose solution provides both database and user management functionality and so relieves the developer from that pain. Surely there is a lot more info to be found when searching for data concurrence, conflict resolution and the likes.

To conclude, I have to add the obligatory disclaimer, that all written here are merely my own thoughts and suggestions, that everyone should use at own risk and don't hold me responsible if you suddenly get too many happy users making your system crash :)

As I am myself working on an app, where I am implementing all those aspects, I am certainly very interested to hear other opinions and what else folks have to say on the subject.

Question 2

From my experience - both as a user of sites that require a login, and as a developer working with logged in users - I don't think I've ever seen a site behave this way.

The common pattern is to let a user be anonymous and the first time they do something that would require saving state, they are prompted to login. Then the previous action is remembered and the user can continue. For example, if they try to add something to their shopping cart, they are prompted to login and then after login, the item is in their cart.

I suppose some places would allow you to fill a cart and then login at which point the cart is associated with a concrete user.

I would create a SessionUser object that has the state of the site interaction and one field called UserId that is used to retrieve other things like name, address, etc.

With anonymous users, I would create the SessionUser object with an empty reference for UserId. This means we can't resolve a name or an address, but we can still save state. The actions they are performing, the pages they're viewing, etc.

Once they login we don't have to merge two objects, we just populate the UserId field in SessionUser and now we can traverse an object graph to get name, email, address or whatever else.