Question

I'm quite new to PHP and am creating a web scraper for a project. From this website, https://www.bloglovin.com/en/blogs/1/2/all, I am scraping the blog title, blog url, image url and concatenating a follow through link for later use. As you can see on the page, there are several fields with information for each blogger.

Here is my PHP code so far;

<?php

        // Function to make GET request using cURL
        function curlGet($url) {
            $ch = curl_init(); // Initialising cURL session
            // Setting cURL options
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
            curl_setopt($ch, CURLOPT_URL, $url);
            $results = curl_exec($ch); // Executing cURL session
            curl_close($ch); // Closing cURL session
            return $results; // Return the results
        }

        $blogStats = array();

        function returnXPathObject($item) {
            $xmlPageDom = new DomDocument(); 
            @$xmlPageDom->loadHTML($item); 
            $xmlPageXPath = new DOMXPath($xmlPageDom); 
            return $xmlPageXPath; 
        }

        $blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
        $blPageXpath = returnXPathObject($blPage); 

        $title = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[1]');
                if ($title->length > 0) {
            $blogStats['title'] = $title->item(0)->nodeValue;
        }

        $url = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[2]');
            if ($url->length > 0) {
            $blogStats['url'] = $url->item(0)->nodeValue;
        }

        $img = $blPageXpath->query('//*[@id="content"]//div/a/div/@href');
            if ($img->length > 0) {
            $blogStats['img'] = $img->item(0)->nodeValue;
        }

        $followLink = $blPageXpath->query('//*[@id="content"]/div[1]/div/a/@href');
            if ($followLink->length > 0) {
                $blogStats['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
        }


        print_r($blogStats);


        /*$data = $blogStats;
        header('Content-Type: application/json');
        echo json_encode($data);*/
    ?>

Currently, this only returns:

Array ( [title] => Fashion Toast [url] => fashiontoast.com [followLink] => http://www.bloglovin.com/blog/4735/fashion-toast )

My question is, what is the best way to loop through each of the results? I've been looking through Stack Overflow and am struggling to find an answer to my question, and my heads going a bit loopy! If anyone could advise me or put me in the right direction, that would be fantastic.

Thank you.

Update: I'm very sure this is wrong, i'm receiving errors!

<?php

    // Function to make GET request using cURL
    function curlGet($url) {
        $ch = curl_init(); // Initialising cURL session
        // Setting cURL options
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
        curl_setopt($ch, CURLOPT_URL, $url);
        $results = curl_exec($ch); // Executing cURL session
        curl_close($ch); // Closing cURL session
        return $results; // Return the results
    }

    $blogStats = array();

    function returnXPathObject($item) {
        $xmlPageDom = new DomDocument(); 
        @$xmlPageDom->loadHTML($item); 
        $xmlPageXPath = new DOMXPath($xmlPageDom); 
        return $xmlPageXPath; 
    }

$blogPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
    $blogPageXpath = returnXPathObject($blogPage);

    $blogger = $blogPageXpath->query('//*[@id="content"]/div/@data-blog-id');
    if ($blogger->length > 0) {
    $blogStats[] = $blogger->item(0)->nodeValue;
    }


    foreach($blogger as $id) {

            $blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
            $blPageXpath = returnXPathObject($blPage);

            $title = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[1]');
                if ($title->length > 0) {
                $blogStats[$id]['title'] = $title->item(0)->nodeValue;
            }

            $url = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[2]');
                if ($url->length > 0) {
                $blogStats[$id]['url'] = $url->item(0)->nodeValue;
            }

            $img = $blPageXpath->query('//*[@id="content"]//div/a/div/@href');
                if ($img->length > 0) {
                $blogStats[$id]['img'] = $img->item(0)->nodeValue;
            }

            $followLink = $blPageXpath->query('//*[@id="content"]/div[1]/div/a/@href');
                if ($followLink->length > 0) {
                $blogStats[$id]['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
            }
            }



    print_r($blogStats);


    /*$data = $blogStats;
    header('Content-Type: application/json');
    echo json_encode($data);*/ ?>
Was it helpful?

Solution

maybe you want to actually add a dimension to your array. I guess bloggers have a unique id, or somesuch identifier.

moreover, your code seems to execute only once? it might need to be in something like a foreach

I can't do that part for you, but you need an array containing each blogger, or a way to do a while, or for! you have to understand how to iterate over your different bloggers by yourself :)

here an exemple of array of bloggers

[14]['bloggerOne']
[15]['bloggerTwo']
[16]['bloggerThree']
foreach ($blogger as $id => $name)  
 {

$blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/' . $name); 
// here you have something to do so that $blPage is actually different with each iteration, like changing the url
$blPageXpath = returnXPathObject($blPage); 

$title = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[1]');
            if ($title->length > 0) {
        $blogStats[$id]['title'] = $title->item(0)->nodeValue;
    }

    $url = $blPageXpath->query('//*[@id="content"]//div/a/h2/span[2]');
        if ($url->length > 0) {
        $blogStats[$id]['url'] = $url->item(0)->nodeValue;
    }

    $img = $blPageXpath->query('//*[@id="content"]//div/a/div/@href');
        if ($img->length > 0) {
        $blogStats[$id]['img'] = $img->item(0)->nodeValue;
    }

    $followLink = $blPageXpath->query('//*[@id="content"]/div[1]/div/a/@href');
        if ($followLink->length > 0) {
            $blogStats[$id]['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
    }


  }

so after the foreach, you array could look like:

['12345']['title'] = whatever
         ['url'] = url
         ['img'] = foo
         ['followLink'] = bar
['4141']['title'] = other
        ['url'] = urlss
        ['img'] = foo
        ['followLink'] = bar
['7415']['title'] = still
        ['url'] = url4
        ['img'] = foo
        ['followLink'] = bar
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top