ritorno URLs Scraping una pagina Web con Nodejs

https://stackoverflow.com//questions/22072536

23-12-2019
|

Domanda

Sto cercando di costruire una semplice app Web raschiando un sito Web utilizzando Nodejs e le sue 2 moduli Richiesta e Cheerio.

riesco a farlo con il seguente codice:

    var printURL=function(url){
    request(url, (function() {
        return function(err, resp, body) {
            if (err)
                throw err;
            $ = cheerio.load(body);

            $('img').each(function(){
                console.log($(this).attr('src'));
            });

        }
    } )());
};

Funziona bene per stampare l'URL delle immagini sul sito Web ma ciò che sto davvero cercando di fare qui è creare un elenco di URL che potrei usare al di fuori della funzione.L'ho provato in questo modo, ma restituisce una lista vuota:

var urlList=[];     
var printURL=function(url){
        request(url, (function() {
            return function(err, resp, body) {
                if (err)
                    throw err;
                $ = cheerio.load(body);

                $('img').each(function(){
                    urlList.push($(this).attr('src'));
                });

            }
        } )());
    };

Come posso risolvere questo?Molte grazie

Soluzione

È necessario attendere fino a quando non vengono eseguiti tutti i callback.

var urlList=[];     
var printURL=function(url){
    request(url, (function() {
        return function(err, resp, body) {
            if (err)
                throw err;
            $ = cheerio.load(body);
            var images = $('img');
            var counter = images.length;
            images.each(function(){
                urlList.push($(this).attr('src'));
                counter--;
                if (counter==0) {
                    // now we have all images!!
                    console.log(urlList);
                }
            });

        }
    })());
};

Questo fa parte della natura asincrona del nodo.js.Se le cose diventano più complicate ti consiglierei di utilizzare una libreria di controllo del flusso come asinnc .

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow