Exclude string match using c# regex
-
05-07-2021 - |
Pergunta
I am new to regular expressions.I am trying to find the Images doesn't having BORDER. So the result should second Image.The text which is trying to match using regex is below.
<IMG onerror="this.errored=true;" USEMAP="#Map-43" BORDER="0"/>
<IMG onerror="this.errored=true;" USEMAP="#Map-43" />
<IMG onerror="this.errored=true;" USEMAP="#Map-43" BORDER="0"/>
I tried the following regex but didn't worked
<IMG\\s[^((>)&(?!BORDER)]*>
So can any one help on this please?
Solução
You can use HtmlAgilityPack to parse html
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var imgs = doc.DocumentNode.Descendants("img")
.Where(n => n.Attributes["border"] == null)
.ToList();
PS: See also this: RegEx match open tags except XHTML self-contained tags
Outras dicas
The better choice would be to use an html parser for such a problem.
But your main regex problem here is that you put your lookahead into a character class, that way all character where treated as literal characters.
<IMG\s(?:(?!BORDER)[^>])*>
should work better. See it on Regexr.
But thats only to explain your regex problem. To solve your programming task please use L.B answer.
Working example:
String html = "<IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" BORDER=\"0\"/><IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" /><IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" BORDER=\"0\"/>";
Console.WriteLine(Regex.Matches(html, @"<IMG\s(?:(?!BORDER)[^>])*>").Cast<Match>().ToList()[0]);
Console.ReadLine();
Another way is to get the "no border attribute" images client-side with the jQuery and CSS selectors:
$img = $('img').not('[border]');
Links: