Frage

I am new to DOM parsing, but I got most of this figured out. I'm just having trouble removing nbsp; from a div.

Here's my PHP:

    function parseDOM($url) {
        $dom = new DOMDocument;
        @$dom->loadHTMLFile($url);
        $xpath = new DOMXPath($dom);
        $movies = array();
        foreach ($xpath->query('//div[@class="mshow"]') as $movie) {
            $item = array();
            $links = $xpath->query('.//a', $movie);
            $item['trailer'] = $links->item(0)->getAttribute('href');
            $item['reviews'] = $links->item(1)->getAttribute('href');
            $item['link'] = $links->item(2)->getAttribute('href');
            $item['title'] = $links->item(2)->nodeValue;
            $item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
                $movie)->item(0)->nodeValue);
            $i = 0;
            foreach ($xpath->query('.//div[@class="rsd"]', $movie) as $date) {
                $dates = $xpath->query('.//div[@class="rsd"]', $movie);
                $times = $xpath->query('.//div[@class="rst"]', $movie);
                $item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;
                $i += 1;
            }
            $movies[] = $item;
        }
        return $movies;
    }

    $url = 'http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1';
    $movies = parseDOM($url);
    foreach ($movies as $key => $value) {
        echo $value['title'] . '<br>';
        echo $value['link'] . '<br>';
        echo $value['rating'] . '<br>';
        foreach ($value['datetime'] as $datetime) {
            echo $datetime . '<br>';
        }                     
    }                 

Here's what the HTML looks like:

    <div class="rst" >6:45pm &nbsp;&nbsp;9:30pm &nbsp;&nbsp;</div>

Is there something I can add to the xpath query to achieve this? I did try adding strip_tags to $times->item($i)->nodeValue, but it's still printing out like: Thu, May 01: 6:45pm   9:30pm  Â

Edit: str_replace("\xc2\xa0", '', $times->item($i)->nodeValue); seems to do the trick.

War es hilfreich?

Lösung

try this :

$times->item($i)->nodeValue = str_replace("&nbsp;","",$times->item($i)->nodeValue);

it should delete every &nbsp;


EDIT

your line :

$item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;

become :

$item['datetime'][] = $dates->item($i)->nodeValue 
                        . str_replace("&nbsp;","",$times->item($i)->nodeValue);

EDIT 2

if str_replace does not work, try with str_ireplaceas suggested in comment.

If it still doesn't work, you can also try with :

preg_replace("#&nbsp;#","",$times->item($i)->nodeValue);

EDIT 3

you may have an encoding problem. see uft8_encode

Or piggy solution :

str_replace("Â","",$times->item($i)->nodeValue);

Apolo

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top