Question

I'm reading email headers (in Node.js, for those keeping score) and they are VARY varied. E-mail addresses in the to field look like:

"Jake Smart" <jake@smart.com>, jack@smart.com, "Development, Business" <bizdev@smart.com>

and a variety of other formats. Is there any way to parse all of this out?

Here's my first stab:

  1. Run a split() on - to break up the different people into an array
  2. For each item, see if there's a < or ".
  3. If there's a <, then parse out the email
  4. If there's a ", then parse out the name
  5. For the name, if there's a ,, then split to get Last, First names.

If I first do a split on the ,, then the Development, Business will cause a split error. Spaces are also inconsistent. Plus, there may be more e-mail address formats that come through in headers that I haven't seen before. Is there any way (or maybe an awesome Node.js library) that will do all of this for me?

Was it helpful?

Solution

There's a npm module for this - mimelib (or mimelib-noiconv if you are on windows or don't want to compile node-iconv)

npm install mimelib-noiconv

And the usage would be:

var mimelib = require("mimelib-noiconv");
var addressStr = 'jack@smart.com, "Development, Business" <bizdev@smart.com>';
var addresses = mimelib.parseAddresses(addressStr);

console.log(addresses);
// [{ address: 'jack@smart.com', name: '' },
//  { address: 'bizdev@smart.com', name: 'Development, Business' }]

OTHER TIPS

The actual formatting for that is pretty complicated, but here is a regex that works. I can't promise it always will work though. http://tools.ietf.org/html/rfc2822#page-15

const str = "...";
const pat = /(?:"([^"]+)")? ?<?(.*?@[^>,]+)>?,? ?/g;

let m;
while (m = pat.exec(str)) {
  const name = m[1];
  const mail = m[2];

  // Do whatever you need.
}

I'd try and do it all in one iteration (performance). Just threw it together (limited testing):

var header = "\"Jake Smart\" <jake@smart.com>, jack@smart.com, \"Development, Business\" <bizdev@smart.com>";
alert (header);
var info = [];
var current = [];
var state = -1;
var temp = "";
for (var i = 0; i < header.length + 1; i++) {
  var c = header[i];
  if (state == 0) {
    if (c == "\"") {
      current.push(temp);
      temp = "";
      state = -1;
    } else {
      temp += c;
    }
  } else if (state == 1) {
    if (c == ">") {
      current.push(temp);
      info.push (current);
      current = [];
      temp = "";
      state = -1;
    } else {
      temp += c;
    }
  } else {
    if (c == "<"){
      state = 1;
    } else if (c == "\"") {
      state = 0;
    }
  }
}

alert ("INFO: \n" + info);

For something complete, you should port this to JS: http://cpansearch.perl.org/src/RJBS/Email-Address-1.895/lib/Email/Address.pm

It gives you all the parts you need. The tricky bit is just the set of regexps at the start.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top