Pregunta

I'm working on getting my new website up and I cannot figure out the best way to do some parsing.

What I'm doing is trying to parse this webpage for the comments (last 3) the "whats new" page, permissions page, and the right-bar (the one with the ratings etc).

I have looked at parse_url and a few other methods, but nothing is really working at all.

Any help is appreciated, and examples are even better! Thanks in advance.

¿Fue útil?

Solución

Simple HTML DOM

I use it and it works great. Samples at the link provided.

Otros consejos

i recommend to use the DOM to this job, here it is an example to fetch all the urls in a website:

$doc = new DOMDocument();
$doc->loadHTMLFile('http://www.theurlyouwanttoscrape.com');

foreach( $doc->getElementsByTagName('a') as $item){
    $href =  $item->getAttribute('href');
    var_dump($href);
}

parse_url parses the actual URL (not the page the URL points to).

What you want to do is scrape the webpage it is pointing to, and pick up content from there. You would need to use fopen, which will give you the HTML source of the page and then parse the HTML and pick up what you need.

Disclaimer: Scraping pages is not always allowed.

PHP SimpleXML extension is your friend here: http://php.net/manual/en/book.simplexml.php

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top