Question

I have a bunch of strings, each containing an anchor tag and url.

string ex.

here is a link <a href="http://www.google.com">http://www.google.com</a>. enjoy!

i want to parse out the anchor tags and everything in between.

result ex.

here is a link. enjoy!

the urls in the href= portion don't always match the link text however (sometimes there are shortened urls,sometimes just descriptive text).

i'm having an extremely difficult time figuring out how to do this with either regular expressions or php functions. how can i parse an entire anchor tag/link from a string?

thanks!

Was it helpful?

Solution

You shouldn't use regex to parse html and use an html parser instead.

But if you should use regex, and your anchor tags inner contents are guaranteed to be free of html like </a>, and each string is guaranteed to contain only one anchor tag as in the example case, then - only then - you can use something like:

Replacing /^(.+)<a.+<\/a>(.+)$/ with $1$2

OTHER TIPS

Looking at your result example, it seems like you're just removing the tags/content - did you want to keep what you stripped out or no? If not you might be looking for strip_tags().

Since your problem seems to be very specific, I think this should do it:

$str = preg_replace('#\s?<a.*/a>#', '', $str);

just use your normal PHP string functions.

$str='here is a link <a href="http://www.google.com">http://www.google.com</a>. enjoy!';
$s = explode("</a>",$str);
foreach($s as $a=>$b){
    if( strpos( $b ,"href")!==FALSE ){
        $m=strpos("$b","<a");
        echo substr($b,0,$m);
    }
}   
print end($s);

output

$ php test.php
here is a link . enjoy!
$string = 'here is a link <a href="http://www.google.com">http://www.google.com</a>. enjoy!';
$text = strip_tags($string);
echo $text; //Outputs "here is a link . enjoy!"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top