Question

I want to replace certain html tags with null string and want to retrieve text only. Below is the example that I want.

preg_match_all("/<span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">(.*)<\/span>/U", $content, $matches);

The above line retrieves something like this.

<span id="priceblock_ourprice" class="a-size-medium a-color-price">50</span>

Now, I want to retrieve the integer value only (i.e 50). I tried the following statement to remove the HTML tags.

    foreach($matches[0] as $key=>$val) {
        $price = preg_replace( '/<(.*)>/', '', $val);
    }

But the problem is, it replaces everything, and a null string is returned. It should return 50, no the null. The output file $price variable should be like:

$price = 50
Was it helpful?

Solution

Try adding a question mark to your regular expression

foreach($matches[0] as $key=>$val) {
  $price = preg_replace( '/<(.*?)>/', '', $val);
}

This will have the effect of finding the first > instead of the last one. Regular expressions are greedy and will find everything it can.

Also, keep in mind that the way you are doing this will replace $price with each loop. I am assuming you're doing something with $price before the next loop occurs, but if not, you should store the price in an array.

OTHER TIPS

If it seems to match more than expected use ? for a non greedy match. Greedy (.*) will consume as much as possible, while making it non greedy (.*?) will prevent this.

preg_replace('/<(.*?)>/', '', $val);

I would consider using DOM for this also, below is an example.

$content = <<<DATA
<span id="priceblock_ourprice" class="a-size-medium a-color-price">50</span>
<span id="priceblock_ourprice" class="a-size-medium a-color-price">40</span>
<span id="foo">30</span>
DATA;

$doc = new DOMDocument();
$doc->loadHTML($content); // Load your HTML content

$xpath = new DOMXPath($doc);
$vals = $xpath->query("//span[@id='priceblock_ourprice']");

foreach ($vals as $val) {
   echo $val->nodeValue . "\n";
}

Output

50
40
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top