Question

I'm parsing an PHP script to C# due to performance.

This is the PHP source where i'm having trouble with:

$dom = new DOMDocument;
$dom->loadHTML($message);
foreach ($dom->getElementsByTagName('a') as $node) {
    if ($node->hasAttribute('href')) {
        $link = $node->getAttribute('href');
        if ((strpos($link, 'http://') === 0) || (strpos($link, 'https://') === 0)) {
            $add_key = ((strpos($link, '{key}') !== false) || (strpos($link, '%7Bkey%7D') !== false));
            $node->setAttribute('href', $url . 'index.php?route=ne/track/click&link=' . urlencode(base64_encode($link)) . '&uid={uid}&language=' . $data['language_code'] . ($add_key ? '&key={key}' : ''));
        }
    }
}

The problem that i'm having is the getElementByTagName part.

As said here, should i use htmlagilitypack. My code so far is this:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(leMessage);

leMessage is an string that holds the HTML. So far so good. Only problem is that there isn't an getElementsByTag function in the HtmlAgillityPack. And in the normal HtmlDocument ( without the pack ), i can't use an string as html page right?

So does anybody knows what i should do to make this work? Only thing i can think of now is to make an webbrowser in the windows form and set the document content to leMessage and then parse it from there. But personaly i don't like that solution... But if there isn't another way...

Was it helpful?

Solution

The following was the first top-of-the-page block of code that popped up when I followed your link and clicked on "Examples":

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    // DO SOMETHING WITH THE LINK HERE
 }
 doc.Save("file.htm");

Please do your own googling in the future.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top