RegEx to return 'href' attribute of 'link' tags only?
Question
Im trying to craft a regex that only returns <link>
tag hrefs
Why does this regex return all hrefs including <a hrefs?
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
<link rel="stylesheet" rev="stylesheet" href="idlecore-tidied.css?T_2_5_0_228" media="screen"> <a href="anotherurl">Slash Boxes</a>
thank you
Solution
Either
/(?<=<link\b[^<>]*?)\bhref=\s*=\s*(?:"[^"]*"|'[^']'|\S+)/
or
/<link\b[^<>]*?\b(href=\s*=\s*(?:"[^"]*"|'[^']'|\S+))/
The main difference is [^<>]*?
instead of .*?
. This is because you don't want it to continue the search into other tags.
OTHER TIPS
Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.
I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2)
in The Regex Coach with s and g options.
/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
i'm a little shaky on the back-references myself, so I left that in there. This regex though:
/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
...works in my Javascript test.
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing '
or
"
:
(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)
Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be
(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))
Your match will then be in the captured group 1.
What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):
(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1
as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:
(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)
\1 will contain your match.