Regex to match domain.com but not @domain.com

Question 1

After a lot of messing about, this ended up working (with a definite hat tip to @zmo's final comment):

var rx = /\b(www\.)?(\w*@)?([a-zA-Z\-]*\.)(com|org|net|edu|COM|ORG|NET|EDU)(\.au)?(\/\S*)?/g;
var link = txt.match(rx);
    if(link !== null) {
    for(var i = 0; i < link.length; i++) {
      if (link[i].indexOf('@') == -1) {
         //create link
       } else {
        //create mailto;
       }
       }
       }

I'm aware of the limitations with regard to sub-domains, TLDs, etc. (which@zmo has addressed above - and if you need to catch all URLs, I'd suggest you adapt that code), but that was not the main issue in my case. The code in my answer allows matches to URLs present in a text string without 'www.', without also catching the domain of an e-mail address.

Question 2

that fails if the match is not at the start of the string

it's because of the ^ at the beginning of the match:

/(www\.)?([^@])([a-z]*\.)(com|net|edu|org)(\.au)?(\/\S*)?$/g

js> "www.foobar.com".match(/(www\.)?([^@])([a-z]*\.)(com|net|edu|org)(\.au)?(\/\S*)?$/g)
["www.foobar.com"]
js> "aoeuaoeu foobar.com".match(/(www\.)?([^@])([a-z]*\.)(com|net|edu|org)(\.au)?(\/\S*)?$/g)
[" foobar.com"]
js> "toto@aoeuaoeu foobar.com".match(/(www\.)?([^@])([a-z]*\.)(com|net|edu|org)(\.au)?(\/\S*)?$/g)
[" foobar.com"]
js> "toto@aoeuaoeu toto@foobar.com".match(/(www\.)?([^@])([a-z]*\.)(com|net|edu|org)(\.au)?(\/\S*)?$/g)
["foobar.com"]

though it's still matching a space before the domain. And it's making wrong assumptions about the domain…

xyz.example.org is a valid domain not matched by your regexp ;
www.3x4mpl3.org is a valid domain not matched by your regexp ;
example.co.uk is a valid domain not matched by your regexp ;
ουτοπία.δπθ.gr is a valid domain not matched by your regexp.

What defines a legal domain name? It's just a sequence of utf-8 characters separated by dots. It can't have two dots following each other, and the canonical name is \w\.\w\w (as I don't think a one letter tld exists).

Though, the way I'd do it is to simply match everything that looks like a domain, by taking everything that is text with a dot separator using word boundaries (\b):

/\b(\w+\.)+\w+\b/g

js> "aoe toto.example.org  uaoeu foo.bar aoeuaoeu".match(/\b(\w+\.)+\w+\b/g)
["toto.example.org", "foo.bar"]
js> "aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu".match(/\b(\w+\.)+\w+\b/g)
["example.org", "toto.example.org", "foo.bar"]
js> "aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu f00bar.com".match(/\b(\w+\.)+\w+\b/g)
["example.org", "toto.example.org", "foo.bar", "f00bar.com"]

and then make a second round to check whether the domain really exists or not in the list of domains found. The downside is that regexps in javascript can't check against unicode characters, and either \b or \w won't accept ουτοπία.δπθ.gr as a valid domain name.

In ES6, there's the /u modifier, which should work with latest browsers (but none that I have tested so far):

"ουτοπία.δπθ.gr aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu".match(/\b(\w+\.)+\w+\b/gu)

edit:

A negative lookbehind solves it - but obviously not in JS.

yes it will: for skipping all e-mail addresses, here's a working look behind implementation of the regex:

/(?![^@])?\b(\w+\.)+\w+\b/g

js> "aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu f00bar.com".match(/(?<![^@])?\b(\w+\.)+\w+\b/g)
["toto.example.org", "foo.bar", "f00bar.com"]

though it's the same as unicode… it'll be there in JS soon…

the only way around there is, is to actually preserve the @ in the matched regexp, and discard any match that contains an @:

js> "toto.net aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu f00bar.com".match(/@?\b\w+\.+\w+\b/g).map(function (x) { if (!x.match(/@/)) return x })
["toto.net", (void 0), "toto.example", "foo.bar", "f00bar.com"]

or use the new list comprehension from ES6/JS1.7, which should be there in modern browsers…

[x for x of "toto.net aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu f00bar.com".match(/@?\b\w+\.+\w+\b/g) if (!x.match(/@/))];

one final update:

/@?\b(\w*[^\W\d]+\w*\.+)+[^\W\d_]{2,}\b/g

> "x.y tot.toc.toc $11.00 11.com 11foo.com toto.11 toto.net aoe toto@example.org toto.example.org  uaoeu foo.bar aoeuaoeu f00bar.com".match(/@?\b(\w*[^\W\d]+\w*\.+)+[^\W\d_]{2,}\b/g).filter(function (x) { if (!x.match(/@/)) return x })
[ 'tot.toc.toc',
  '11foo.com',
  'toto.net',
  'toto.example.org',
  'foo.bar',
  'f00bar.com' ]