Question

I want to get the length of articles published on newspapers and magazines websites and on blogs. In a server made in Node.js, I want to use the "readabilitySAX" module (https://github.com/fb55/readabilitySAX), but I must make a mistake with the way to use it because this code is not working:

var Readability = require("readabilitySAX/readabilitySAX.js"),
Parser = require("htmlparser2/lib/Parser.js");

var readable = new Readability({
    pageURL: "http://www.nytimes.com/2014/04/18/business/treatment-cost-could-influence-doctors-advice.html?src=me&ref=general"
});
parser = new Parser(readable, {});

console.log(readable.getArticle().textLength);
Was it helpful?

Solution

The pageURL attribute is used when Readability resolve relative links, not to download a page.

To download a page, you can use the get method :

require("readabilitySAX").get("http://url", {type:"html"}, function(article) {
    console.log(article.textLength);
})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top