Question

We have an internal .NET case management application that automatically creates a new case from an email. I want to be able to identify other emails that are related to the original email so we can prevent duplicate cases from being created.

I have observed that many, but not all, emails have a thread-index header that looks useful.

Does anybody know of a straightforward algorithm or package that we could use?

Was it helpful?

Solution

OTHER TIPS

As far as I know, there's not going to be a 100% foolproof solution, as not all email clients or gateways preserve or respect all headers.

However, you'll get a pretty high hit rate with the following:

  • Every email message should have a unique "Message-ID" field. Find this, and keep a record of it as a part of the case. (See RFC-822)

  • If you receive two messages with the same Message-ID, discard the second one as it's a duplicate.

  • Check for the "In-Reply-To" field, if the ID shown matches a known Message-ID then you know the email is related.

  • The "References" and "Original-Message-ID" headers have similar meanings.

If your system ever generates emails, include a CaseID# in the subject line in a way that you can search for it if you get an email back (eg: [Case#20081114-01]); most people don't edit subject lines when replying.

The internet standards RFC-822, RFC-2076 and RFC-4021 may be useful further reading.

Given that there will always be messages that are missed (for whatever reason), you'll also probably want related features in your case management system - say, "Close as Duplicate Case" or "Merge with Duplicate Case", along with tools to make it easier to find duplicates.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top