Because you are mapping to the saga data via the ClientRef property, you need to tell the persistence (Raven in this case) that this property is unique. What is probably happening is that, in some cases (it comes down to a race condition) the query done on the Raven index by the second message retrieves stale data, assumes there is no saga data, and creates new.
This should fix your issue:
[Unique]
public Guid ClientRef { get; set; }
With this information, the Raven saga persister will create an additional document based on this property (because loading by Id in Raven is fully atomic) so that the second message will be sure to find it.
If you were using another persistence medium like NHibernate, the same attribute would be used to construct a unique index on that column.
Edit based on comment
The unique constraint document and your saga data will be fully consistent, so depending on timing of incoming messages, one of 3 things will happen.
- The message is truly the first message to arrive and be processed, so no saga data is found, so it is created.
- The message is the second to arrive, so it looks for the saga data, finds it, and processes successfully.
- The 2nd message arrives very close to the first message, so they are both processing in separate threads at the same time. Both threads look in the saga data and find nothing, so they both begin to process. The one that finishes first commits successfully and saves its saga data. The one that finishes second attempts to save the saga data, but finds that while it's been working the other thread has moved its cheese, so Raven throws a concurrency exception. Your message goes back on the queue and is retried, and now that the saga data exists, the retry acts like Scenario #2.