Question

I'm working on getting my new website up and I cannot figure out the best way to do some parsing.

What I'm doing is trying to parse this webpage for the comments (last 3) the "whats new" page, permissions page, and the right-bar (the one with the ratings etc).

I have looked at parse_url and a few other methods, but nothing is really working at all.

Any help is appreciated, and examples are even better! Thanks in advance.

Was it helpful?

Solution

Simple HTML DOM

I use it and it works great. Samples at the link provided.

OTHER TIPS

i recommend to use the DOM to this job, here it is an example to fetch all the urls in a website:

$doc = new DOMDocument();
$doc->loadHTMLFile('http://www.theurlyouwanttoscrape.com');

foreach( $doc->getElementsByTagName('a') as $item){
    $href =  $item->getAttribute('href');
    var_dump($href);
}

parse_url parses the actual URL (not the page the URL points to).

What you want to do is scrape the webpage it is pointing to, and pick up content from there. You would need to use fopen, which will give you the HTML source of the page and then parse the HTML and pick up what you need.

Disclaimer: Scraping pages is not always allowed.

PHP SimpleXML extension is your friend here: http://php.net/manual/en/book.simplexml.php

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top