Pergunta

first of all thank you for your next response.

I can not get the source code of a page (to extract the contents) of

http://steamcommunity.com/market/search?q=booster#p2 (-->$path)

here is my first source code:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $path);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
$file_contents = curl_exec($ch);
curl_close($ch);
$file_contents =  htmlentities($file_contents);
print_r($file_contents);

here a second trial :

$fp=null;
$fp=@fopen($path,"r");
$contenu = "";
if($fp){
 while(!feof($fp)){
 $contenu .=  stream_get_line($fp,65535);
 }
 print_r($contenu);
}
else{
 echo "Impossible d'ouvrir la page $path";
}

with this code I get the source code of this page : http://steamcommunity.com/market/search?q=booster or this page ..../market/search?q=booster#p1

I said that the source code displayed by firefox is not good and only dom inspector allows me to see the "real" source code. Do you have a solution?

Foi útil?

Solução 2

You're hitting the wrong URL. Instead, hit the AJAX query one inside it and parse it as JSON:

$f = file_get_contents(
    "http://steamcommunity.com/market/search/render/?" .
    "query=booster&start=10&count=10"
);
$t = json_decode( $f );
print_r( $t );

And you get a neatly organized structure, such as:

stdClass Object (
    [success] => 1
    [start] => 0
    [pagesize] => 10
    [total_count] => 330
    [results_html] => <div class="market_listing_table_header">
    ...

Essentially the JSON file that's used to render the page can be read as a neat structure in PHP. Or close enough. You'll still need to walk through $t->results_html with DOM Document / XPath for further parsing.

Outras dicas

You won't be able to do this using PHP. You need to execute the page's javascript to get the rendered DOM. (The rendered DOM is what you're seeing when you use the DOM inspector.)

Maybe use PhantomJS to open the page and get the rendered DOM. See Using Phantom.js evaluate, how can I get the HTML of the page?.

I said that the source code displayed by firefox is not good and only dom inspector allows me to see the "real" source code. Do you have a solution?

That's completely backwards. The DOM inspector shows you the current state of the page, as modified by Javascript and/or the user (e.g, form state changes). The source code as displayed by Firefox's "View Source" is the "real" source code as delivered by the web server.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top