Question

I'm using PHP's cURL to get some tag information from various URLs. My requests work some of the time, but other times they don't work at all. Is there some reason why my code doesn't work? (Note that I'm also using simple_html_dom):

$webpage = 'http://www.some_url.com';

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, $webpage);  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);

$str = curl_exec($curl);  
curl_close($curl);  

$html = '';

if( !empty($str) )
{
    require_once( 'simple_html_dom.php');

    $html= str_get_html($str);
    $element = $html->find('h1', 0);
    $webpage_name = strip_tags($element);

    $item = $html->find('meta[name=description]', 0);
    $description =  $item->content;
}

// save $description to database
// save $webpage_name to database

For about half the URLs I try, the description and webpage_name are stored in my database, but for the other half, they are not stored, and the script just stalls. That is, when the user submits a URL to my website a progress bar is presented while the URL is uploading to my site. Then, the progress bar disappears and the URL is displayed on my webpage for the user to see once the URL submission is complete. For troublesome URLs, the progress bar goes away, but the link doesn't appear on the page and nothing is stored to my database. What am I missing?

Was it helpful?

Solution 2

My error log is saying "Call to undefined function mb_detect_encoding()". This function requires that the mbstring extension is enabled (it is needed by simple_html_dom.php). MAMP does have this installed by default, and that is why it works on my development server, but not on my production server. I have put in a request to have mbstring enabled on my Linux production server, so I'll let everyone know if this is in fact what the problem was. I have seen several posts online with people having the same problem, so I hope this will help a lot of people.

OTHER TIPS

Try using curl_getinfo before your curl_close call. In addition to a ton of other useful info, it'll give you the HTTP status code, which will let you know what's happening with your requests. That should give you the answers you need... just make sure to remove that CURLOPT_FAILONERROR setting (or set it to false).

Your question was a long time ago, but here is my solution. I had the same problem, curl working local on my Windows machine but not on Linux. Just some urls, not all of them. I was using CURLOPT_SSL_VERIFYPEER set to false, then I added CURLOPT_SSL_VERIFYHOST as well. At least in my case, urls not working was due to SSL certificates not well defined for the domain I was trying to access. I do not know why it was working on Windows even without this parameter, but it worked for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top