how to scrape content from a line break in a web pages using cheerio

https://stackoverflow.com/questions/21499498

05-10-2022
|

문제

Good day to you all.

I have 2 question on web-scraping using Cheerio. I went through the questions that might have my answer but could not find one that answer my question so I decided to ask a question.

Background info: I only learn Javascript for about 2~3 months so I might ask some really funny questions, please pardon me on that.

Objective: I'm looking to scrape data from the following site - and

I'm looking to get the

Name of bike store
Address of bike store
Telephone of bike shop

I've managed to scrape that data that I need, however they are lump in a html group (not sure if this is how to call it. This is the code I used.

var request = require('request');
var cheerio = require('cheerio');

var url = 'http://www.togoparts.com/bikeshops/list_shops.php?country=MY';
request(url, function(err, resp, body) {
    if (err)
        throw err;
    $ = cheerio.load(body, {
        normalizeWhitespace: false
    });
    var doc = $("td[width='52%'].verdana1");
    doc.each(function() {
        var link = $(this);
        console.log(link.html());

   });
});

The result run in a loop and I'm able to get the following. I could not post an image - I have placed the image in the following link.

Question: How do I get the data separately?

I need the title of the link, I tried `var link = $(this).attr('href');' but does not work.

I also need the info (bikeshop address) after the line break - which I have no idea how to take it.

Question2: I tried the following var doc = $("td[width='52%'] .verdana1"); - note the space before the .verdana1 - this give me only the title of the bikeshops I wanted, how is this different from var doc = $("td[width='52%'].verdana1");

and If I'm using this var doc = $("td[width='52%'] .verdana1"); - how can I get the data of the bike shop address?

Thank you so much for reading, I been trying to solve this during the Chinese New Year and it is driving me crazy :(. I looking forward to learn from you guys.

Bryan

해결책

In case of links you can first find them in your document, and then log all href attributes:

var doc = $("td[width='52%'].verdana1");
links = doc.find('a');
links.each(function (i,elem) {
    console.log(elem.attribs.href);
})

This will log all values of href atttribute.

When it comes to addresses it's more complicated because they are not semantically distinguished in the DOM, they are also children of table cells, so you need nested loops, but you can access them by looking up element that are of type text.

doc.each(function (i,elem) {
    # elem here is table cell
    elem.children.forEach(function (child,i) {
       # now all children of table cell (i.e links,spans,divs and just text)
       if (child.type == "text") {
            console.log(child.data);
       }
    })
});

Hope it helps.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow