Why my PHP QueryPath 2.1.2 WAMP scraping script only returns 5 articles instead of 43? Timeout?

Question 1

You might want to add some print statements to at least one of those FOR loops. Several things could be going on here. The two most likely are:

The filter may only be matching five items.
The HTML parser may be choking on some markup. In this case, it will attempt to load as much of the HTML DOM as it can.

By adding in some print statements, you might be able to see how many times it is iterating.

And as an aside, if you're trying to get the list of articles on your blog, reading the RSS or Atom feed might be easier (though I suppose it might not have all the info you need).

Question 2

I have solved my problem!! Apparently, all I needed was a time delay between each query/scrape cause my blog was protecting itself against massive scrapings or whatever. All I had to do is to rewrite the 2nd part of the code like this:

foreach ($links as $link) {
    $url = "http://myblog.com".$link;
    $count = count($links);
    $interval = 2; // Every three times...
    $wait = 2; // Wait two seconds.
        for ($i = 0; $i < $count; ++$i) {
        $content[] = htmlqp($url)->find('.jbIntroText p')->text();
        print_r($content);
            if ($i > 0 && $i % $interval == 0) {
            sleep($wait);
            }

        }
}

Thanks Technosophos for the idea here What are the known or expected impact of using Php/Querypath crawler on a target web server, and how can it be kept to a minimum?

Also thanks for the idea that I should convert blog im about to scrape to RSS/Atom Feed, since alot of the times blogs dont have their own RSS Feed generated