How to deal with abandoned idempotent operations?

https://softwareengineering.stackexchange.com/questions/420692

20-03-2021
|

Question

I have implemented idempotent order placement (mostly to avoid accidental double submissions) but I am not sure how to handle incomplete operations. Example scenario:

User tries to place an order.
An order instance with status PENDING_PAYMENT is created in the DB.
Order payment succeeds (3rd party processor, supporting idempotence keys, e.g. Stripe).
My DB fails to update order status to PAID (e.g. it suddenly went down for a minute) and user receives some error.

Since the whole operation is idempotent, it is safe to retry the operation, and some (most?) users would choose to do that.

But what if the user abandons the operation?

I could implement a Completer process, which would push all incomplete operations through to completion. However, this might come as a surprise to the user.
I could combine the Completer with the assumption that it will eventually be able to successfully place the order, in which case I wouldn't even have to alert the user. However, in an odd case of a failure, I'd have an even more surprising outcome - the once successful order would now appear to be failed.

Questions:

What are some ways of dealing with this situation?
What would the user typically expect?

2.1. Should I let the user know exactly what happened (i.e. payment ok, status not ok), inform the user of a generic failure (something went wrong, please retry), or let them know nothing at all?
2.2. If I inform the user of a generic error, they might decide to update their basket and then resubmit the order. I was thinking that the way to deal with this is to simply generate a fresh idempotence key and create a second order. What are the alternatives?

Additional details:

I don't expect a high rate of failures, but I want to be prepared.
I am not dealing with big money or sensitive data - consider this a simple e-shop.

Update

I actually followed this article from Brandur Leach whilst implementing my idempotent operations, in case you're interested: https://brandur.org/idempotency-keys.

I contacted Brandur directly regarding my problem and you can see what he had to say for yourselves: https://github.com/brandur/sorg/issues/268. The gist is that I should always push all operations to completion, which agrees with the answers here. I can then decide what to do with the result. There may be multiple ways of informing the user too.

Solution

If I place an order an item from a website and I can see in my online banking that my payment has gone through and the website still says "Payment is pending", I'm not likely to walk away. I'm also not likely to retry the order, since I have no idea that you're using an idempotent payment process. I'm far more likely to contact your support channel and complain.

But if the website says "Payment is pending, please wait..." which periodically refreshes, while in the background you have a task that retries all orders that have been sitting in PENDING_PAYMENT for too long (which is perfectly safe to do because it's idempotent), then I'm most likely to sit there and stare at it until the message switches to "Thanks for your payment", which (in the scenario we're considering) will happen shortly, when the background task retries successfully.

OTHER TIPS

It depends on what you tell the user.

If you tell the user it failed you’re done until the user resubmits. Which could be done with a single click.

If you tell the user to wait you can simply keep them waiting while you resubmit. You could make this seamless or you can keep them updated by explaining the delay.

Automatically trying again after announcing a failure without consent will leave the user angry and confused.

Consider unwinding the idempotent transaction to the pre-buy state or a "may try again" state on error. This leaves you with two or three terminal states to handle (in total).

By doing this you can also safely reuse the id. Reusing the id preserves the duplicate prevention benefits of the idempotent approach.

You may discover that reliably unwinding the transaction is very hard. If so, it may be time to rethink the design and/or use of idempotency.

Addition #1 (based on comments): Consider starting with a database update of e.g. PAYMENT_STARTING or whatever. include a timestamp. If this fails, you're at "sorry we're offline right now, please try again later."

Next, call the gateway itself. If this fails you're at "something went wrong, your credit card cannot be charged, please try again later."

Finally, update the database to e.g. PAYMENT_COMPLETE. If this fails you are responsible for retrying based on e.g. the timestamp from the first update. The ideas in other answers about how to handle the user experience are valid.

However you chose to handle the user experience, the goal should be a solution that converges on one of two states: order completed or fully unwound.

Addition #2 (based on comments): It seems that an underlying question is how to handle unreliable calls e.g. in a public cloud. One common approach is a retry library e.g. Polly.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange