I'm considering using JSDom for a project that requires scraping a site.
I started by trying an Amazon page. Here's a sample code:

jsdom.env(url, ["http://code.jquery.com/jquery.js"], function(errors, window) {
    console.log(errors);
    var $ = window.$,
        results = parseResultsPage($);
    //do some stuff
    window.close();
});

At first, I had an if(errors.length > 0) ... clause, but it turns out, errors is always full. Even though the scraping itself works, and I get all the results I need, I always get:

[ { type: 'error',
    message: 'Dispatching event \'DOMNodeInsertedIntoDocument\' failed',
    data: { error: [Object], event: [Object] } } ]

This means I cannot test for errors effectively. Simply ignoring this error feels unsafe to me.

Any suggestions? Could this be an Amazon-related issue? (they're using jQuery 1.2.6 on their pages)

Update:
Submitted issue on JSDom github page (link).

有帮助吗?

解决方案

Well, after a debug session using node-inspector, I managed to single out the piece of code on the Amazon page that throws that error. It's a CSS rule inside a long inline <style> element, that JSDom does not know how to handle:

<style type="text/css">
...
.cust-rec-aui-button @-moz-document url-prefix(){
    .cust-rec-aui-button .a-button .a-button-text{
        line-height:29px
    }

    .cust-rec-aui-button .a-button.a-button-small .a-button-text{
        line-height:21px
    }

}
...
</style>

At first, I thought it was a CSS syntax error (though JSDom is not supposed to throw an exception for those), but then I found some sources (here's one) that say this is perfectly legal.

So, after conferring with the developers of JSDom (see issue on Github to get the whole correspondence, along with code that reproduces the issue), it has been declared a bug, and hopefully will get fixed!

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top