Pergunta

I'm building a web service which receives emails from a number of CRM-systems. Emails typically contain a text status e.g. "Received" or "Completed" as well as a free text comment.

The formats of the incoming email are different, e.g. some systems call the status "Status: ZZZZZ" and some "Action: ZZZZZ". The free text sometimes appear before the status and somethings after. Status codes will be mapped to my systems interpretation and the comment is required too.

Moreover, I'd expect that the the formats change over time so a solution that is configurable, possibly by customers providing their own templates thru a web interface would be ideal.

The service is built using .NET C# MVC 3 but I'd be interested in general strategies as well as any specific libraries/tools/approaches.

I've never quite got my head around RegExp. I'll make a new effort in case it is indeed the way to go. :)

Foi útil?

Solução

I would go with regex:

First example, if you had only Status: ZZZZZ- like messages:

String status = Regex.Match(@"(?<=Status: ).*");
// Explanation of "(?<=Status: ).*" :
// (?<=       Start of the positive look-behind group: it means that the 
//            following text is required but won't appear in the returned string
// Status:    The text defining the email string format
// )          End of the positive look-behind group
// .*         Matches any character

Second example if you had only Status: ZZZZZ and Action: ZZZZZ - like messages:

String status = Regex.Match(@"(?<=(Status|Action): ).*");
// We added (Status|Action) that allows the positive look-behind text to be 
// either 'Status: ', or 'Action: '

Now if you want to give the possibility to the user to provide its own format, you could come up with something like:

String userEntry = GetUserEntry(); // Get the text submitted by the user
String userFormatText = Regex.Escape(userEntry);
String status = Regex.Match(@"(?<=" + userFormatText + ").*");

That would allow the user to submit its format, like Status:, or Action:, or This is my friggin format, now please read the status -->...

The Regex.Escape(userEntry) part is important to ensure that the user doesn't break your regex by submitting special character like \, ?, *...


To know if the user submits the status value before or after the format text, you have several solutions:

  • You could ask the user where his status value is, and then build you regex accordingly:

    if (statusValueIsAfter) {
        // Example: "Status: Closed"
        regexPattern = @"(?<=Status: ).*";
    } else {
        // Example: "Closed:Status"
        regexPattern = @".*(?=:Status)";  // We use here a positive look-AHEAD
    }
    
  • Or you could be smarter and introduce a system of tags for the user entry. For instance, the user submits Status: <value> or <value>=The status and you build the regex by replacing the tags string.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top