Question

need a simply preg_match, which will find "c.aspx" (without quotes) in the content if it finds, it will return the whole url. As a example

$content = '<div>[4]<a href="/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212">New message</a><br/>';

now it should preg_match "c.aspx" from $content and will give a output as

"/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212"

The $content should have more links except "c.aspx". I don't want them. I only want all url that have "c.aspx".

Please let me know how I can do it.

Was it helpful?

Solution

You use DOM to parse HTML, not regex. You can use regex to parse the attribute value though.

Edit: updated example so it checks for c.aspx.

$content = '<div>[4]<a href="/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212">New message</a>

<a href="#bar">foo</a>

<br/>';

$dom = new DOMDocument();
$dom->loadHTML($content);

$anchors = $dom->getElementsByTagName('a');

if ( count($anchors->length) > 0 ) {
    foreach ( $anchors as $anchor ) {
        if ( $anchor->hasAttribute('href') ) {
            $link = $anchor->getAttribute('href');
            if ( strpos( $link, 'c.aspx') ) {
                echo $link;
            }
        }
    }
}

OTHER TIPS

If you want to find any quoted string with c.aspx in it:

/"[^"]*c\.aspx[^"]*"|'[^']*c\.aspx[^']*'/

But really, for parsing most HTML you'd be better off with some sort of DOM parser so that you can be sure what you're matching is really an href.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top