Question

I have not found a regexp to do this. I need to validate the "Message-ID:" value from an email. It is similar to a email address validation regexp but much simpler, without most of the edge cases the email address allows, from rfc2822

msg-id          =       [CFWS] "<" id-left "@" id-right ">" [CFWS] 
id-left         =       dot-atom-text / no-fold-quote / obs-id-left
id-right        =       dot-atom-text / no-fold-literal / obs-id-right
no-fold-quote   =       DQUOTE *(qtext / quoted-pair) DQUOTE
no-fold-literal =       "[" *(dtext / quoted-pair) "]"

Let's say the outter <> are optional. dot-atom-text and missing definitions can be found in rfc2822

I am not proficient in regex and I prefer to use an already tested one, if exists.

Was it helpful?

Solution 2

As I could not find any I ended up implementing it myself. It is not a proper validation as per RFC2822 but a good enough aproximation for now:

static String VALIDMIDPATTERN = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*";
private static Pattern patvalidmid = Pattern.compile(VALIDMIDPATTERN);

public static boolean isMessageIdValid(String midt) {
    String mid = midt;
    if (StringUtils.countMatches(mid, "<") > 1)
        return false;
    if (StringUtils.countMatches(mid, ">") > 1)
        return false;
    if (StringUtils.containsAny(mid, "<>")) {
        mid = StringUtils.substringBetween(mid, "<", ">");
        if (StringUtils.isBlank(mid)) {
            return false;
        }
    }
    if (StringUtils.contains(mid, "..")) {
        return false;
    }
    //extract from <>
    mid = mid.trim();
    //now validate
    Matcher m = patvalidmid.matcher(mid);
    return m.matches();
}

OTHER TIPS

If anyone's interested, one of our senior architects worked through the many layers of RFC 2822 and came up with the following regex which includes quoting on the left and right sides. The spec says that new implementations should not use the obsolete characters, so this regex does not allow them:

((([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*)|("(([\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|[\x21\x23-\x5B\x5D-\x7E])|(\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*"))@(([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*)|(\[(([\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|[\x21-\x5A\x5E-\x7E])|(\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*\])))

It is not possible to perfectly match an RFC2822 Message-ID using standard regular expressions because the CFWS rule allows nesting of comments, which regexes can't cope with. e.g.

<foo@bar.com> (comment (another comment))

try somthing like --> ^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}$

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top