There are several mistakes in the pattern and possible improvements:
/<
\s* # not needed (browsers don't recognize "< a" as an "a" tag)
a # if you want to avoid a confusion between an "a" tag and the start
# of an "abbr" tag, you can add a word boundary or better, a "\s+" since
# there is at least one white character after.
. # The dot match all except newlines, if you have an "a" tag on several
# lines, your pattern will fail. Since Javascript doesn't have the
# "singleline" or "dotall" mode, you must replace it with `[\s\S]` that
# can match all characters (all that is a space + all that is not a space)
* # Quantifiers are greedy by default. ".*" will match all until the end of
# the line, "[\s\S]*" will match all until the end of the string!
# This will cause to the regex engine a lot of backtracking until the last
# "href" will be found (and it is not always the one you want)
href= # You can add a word boundary before the "h" and put optional spaces around
# the equal sign to make your pattern more "waterproof": \bhref\s*=\s*
\" # Don't need to be escaped, as Markasoftware notices it, an attribute
# value is not always between double quotes. You can have single quotes or
# no quotes at all. (1)
(.*?)
\" # same thing
.* # same thing: match all until the last >
>(.*?)<\/a>/gi
(1) -> About the quotes and the href attribute value:
To deal with single, double or no quotes you can use a capturing group and a backreference:
\bhref\s*=\s*(["']?)([^"'\s>]*)\1
details:
\bhref\s*=\s*
(["']?) # capture group 1: can contain a single, a double quote or nothing
([^"'\s>]*) # capture group 2: all that is not a quote to stop before the possible
# closing quote, a space (urls don't have spaces, however javascript
# code can contain spaces) or a ">" to stop at the first space or
# before the end of the tag if quotes are not used.
\1 # backreference to the capture group 1
Note that is you use this subpattern you add a capturing group, and the content between a
tags is now in the capture group 3. Think to change in your replacement string $2
to $3
.
In fine, you can write your pattern like this:
aString.replace(/<a\s+[\s\S]*?\bhref\s*=\s*(["']?)([^"'\s>]*)\1[^>]*>([\s\S]*?)<\/a>/gi,
'$3 (Link->$1)');