Question

I am using the php guzzle Client to grab the website, and then process it with the symfony 2.1 crawler

I am trying to access a form....for example this test form here http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm

$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client($url);

$request = $client->get();
$request->getCurlOptions()->set(CURLOPT_SSL_VERIFYHOST, false);
$request->getCurlOptions()->set(CURLOPT_SSL_VERIFYPEER, false);
$response = $request->send();
$body = $response->getBody(true);
$crawler = new Crawler($body);
$filter = $crawler->selectButton('submit')->form();
var_dump($filter);die();

But i get the exception:

The current node list is empty.

So i am kind of lost, on how to access the form

Was it helpful?

Solution

Try using Goutte, It is a screen scraping and web crawling library build on top of the tools that you are already using (Guzzle, Symfony2 Crawler). See the GitHub repo for more info.

Your code would look like this using Goutte

<?php
use Goutte\Client;

$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client();

$crawler = $client->request('GET', $url);
$form = $crawler->selectButton('submit')->form();
$crawler = $client->submit($form, array(
    'username' => 'myuser', // assuming you are submitting a login form 
    'password' => 'P@S5'
));
var_dump($crawler->count());
echo $crawler->html();
echo $crawler->text();

If you really need to setup the CURL options you can do it this way:

<?php
$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client();
$guzzle = $client->getClient();
$guzzle->setConfig( 
    array(
        'curl.CURLOPT_SSL_VERIFYHOST' => false,
        'curl.CURLOPT_SSL_VERIFYPEER' => false,
    ));
$client->setClient($guzzle);
// ...

UPDATE:

When using the DomCrawler I often times get that same error. Most of the time is because I'm not selecting the correct element in the page, or because it doesn't exist. Try instead of using:

$crawler->selectButton('submit')->form();

do the following:

$form = $crawler->filter('#signin_button')->form();

Where you are using the filter method to get the element by id if it has one '#signin_button' or you could also get it by class '.signin_button'. The filter method requires The CssSelector Component.

Also debug your form by printing out the HTML (echo $crawler->html();) and ensuring that you are actually on the right page.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top