Question

Is there away to remove unwanted text when using (getElementsByTagName) for example.

This gets the published date for the movie for my site

$spans = $dom->getElementsByTagName('span');
for($i=0; $i <$spans-> length; $i++){
    $itemprop = $spans->item($i)->getAttribute("itemprop");
    if ($itemprop == "datePublished"){
        if ($spans->item($i)->textContent!='-'){
            $res['published'] = trim($spans->item($i)->textContent);
        }
    }
}

But what happens is instead of getting this.

12 July 2011

It gets this instead.

12 July 2011 10:47 PM, UTC

So is any code i could add to remove this part.

10:47 PM, UTC
Was it helpful?

Solution

You could use a regular expression to pull out the value:

preg_match('/^\d+ \w+ \d+/', $spans->item($i)->textContent, $matches);
list(, $published_date) = $matches;

Assuming the format of the date doesn't change you shouldn't have a problem. A much better idea however would be parsing it with DateTime::createFromFormat though. This should be correct:

$published_date = DateTime::createFromFormat("d M Y h:i A, e", $spans->item($i)->textContent);

Edit: Updated original code from question with recommended changes:

$spans = $dom->getElementsByTagName('span');
for($i=0; $i < $spans->length; $i++){
    $itemprop = $spans->item($i)->getAttribute("itemprop");
    if ($itemprop == "datePublished"){
        if ($spans->item($i)->textContent!='-'){
            $text_content = trim($spans->item($i)->textContent);
            $published_date = DateTime::createFromFormat("d M Y h:i A, e", $text_content);
            $res['published'] = $published_date->format("d M Y");
        }
    }
}

OTHER TIPS

If you know the string that you are fetching is going to be a date, then you can use the JavaScript Date Object to format the date however you wish, see this link for more info: http://www.elated.com/articles/working-with-dates/

As mentionned by Philip, you could use regular expression.

$pattern = "#([0-9]{2} [a-zA-Z]* [0-9]{4})#i
$subject = "12 July 2013 10:47PM, UTC";
preg_match($pattern, $subject, $matches);

echo $matches[0]; // will find first match

That's how I'd do it, although I can't guarantee what's more performing or more convenient in your code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top