質問

I'm trying to get the title tag of a url with cheerio. But, I'm getting empty string values. This is my code:

app.get('/scrape', function(req, res){

    url = 'http://nrabinowitz.github.io/pjscrape/';

    request(url, function(error, response, html){
        if(!error){
                        var $ = cheerio.load(html);

            var title, release, rating;
            var json = { title : "", release : "", rating : ""};

            $('title').filter(function(){
                //var data = $(this);
                var data = $(this);
                        title = data.children().first().text();            
                        release = data.children().last().children().text();

                json.title = title;
                json.release = release;
            })

            $('.star-box-giga-star').filter(function(){
                var data = $(this);
                rating = data.text();

                json.rating = rating;
            })
        }


        fs.writeFile('output.json', JSON.stringify(json, null, 4), function(err){

            console.log('File successfully written! - Check your project directory for the output.json file');

        })

        // Finally, we'll just send out a message to the browser reminding you that this app does not have a UI.
        res.send('Check your console!')
    })
});
役に立ちましたか?

解決

request(url, function (error, response, body) 
{
  if (!error && response.statusCode == 200) 
  {
    var $ = cheerio.load(body);
    var title = $("title").text();
  }
})

Using Javascript we extract the text contained within the "title" tags.

他のヒント

If Robert Ryan's solution still doesn't work, I'd be suspicious of the formatting of the original page, which may be malformed somehow.

In my case I was accepting gzip and other compression but never decoding, so Cheerio was trying to parse compressed binary bits. When console logging the original body, I was able to spot the binary text instead of plain text HTML.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top