質問

I've been using John Gruber's great URL regex for matching URLs in unstructured text messages. It works fantastically most of the time, but I've found a case in which performance degrades severely depending on the content inside of a parentheses.

// The URL matching regex.
var urls = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/ig;

// An example URL that has horrible performance in some modern browsers
var url = "www.linkedin.com/people/~:(id,first-name,last-name,email-address,picture-url,phone-numbers,public-profile-url)";
url.replace(urls, "<a href='$1'>$1</a>");

The original post contains a multiline commented version of the regex, and can be found here:

http://daringfireball.net/2010/07/improved_regex_for_matching_urls

Here's a JSFiddle that does some performance timing around the problem:

http://jsfiddle.net/xMePg/4/

And its output on Chrome:

Gruber URL Regex Performance

www.a.com/:(aaaaaaaaaaaaaa)1 MS
www.a.com/:(aaaaaaaaaaaaaaa)0 MS
www.a.com/:(aaaaaaaaaaaaaaaa)0 MS
www.a.com/:(aaaaaaaaaaaaaaaaa)2 MS
www.a.com/:(aaaaaaaaaaaaaaaaaa)3 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaa)5 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaa)11 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaa)22 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaaa)44 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaaaa)87 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaaaaa)174 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaaaaaa)348 MS
www.a.com/:(aaaaaaaaaaaaaaaaaaaaaaaaaa)704 MS
www.a.com/:(aaaaaaaaaaaa)(aaaaaaaaaaaaa)0 MS

Can someone identify what may be causing the increase in match times on some modern browsers? I'd like to either cause the match to fail, or optimize the regex in some way.

正しい解決策はありません

他のヒント

Dropping the requirement to match parenthesis makes it much faster. Should work for the vast majority of URLs...

m/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)\S+(?:[^\s`!\[\]{};:'".,?«»“”‘’]))/ig;
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top