Scraping links from more than one URL

https://stackoverflow.com/questions/16019573

04-04-2022
|

Question

I'm using ScraperWiki to pull in links from the london-gazette.co.uk site. How would I edit the code so that I can paste in a number of separate search URLs at the bottom which are all collated into the same datastore?

At the moment I can just paste in the new URL, hit run, and the new data is added on to the back of the old data, but I was wondering if there's a way to speed things up and get the scraper to work on several URLs at once? I would be changing the 'notice code' part of the URLs: issues/2013-01-15;2013-01-15/all=NoticeCode%3a2441/start=1

Sorry - new to Stack Overflow and my coding knowledge is pretty much non existent, but the code is here: https://scraperwiki.com/scrapers/links_1/edit/

Solution

The scraper you linked to seems to be empty, but I had a look at the original scraper by Rebecca Ratcliffe. If yours is the same, you only have to put your URLs into a list and loop through them with a for-loop:

urls = ['/issues/2013-01-15;2013-01-15/all=NoticeCode%3a2441/start=1', 
'/issues /2013-01-15;2013-01-15/all=NoticeCode%3a2453/start=1',
'/issues/2013-01-15;2013-01-15/all=NoticeCode%3a2462/start=1', 
'/issues/2012-02-10;2013-02-20/all=NoticeCode%3a2441/start=1']

base_url = 'http://www.london-gazette.co.uk'
for u in urls:
    starting_url = urlparse.urljoin(base_url, u)
    scrape_and_look_for_next_link(starting_url)

Just have a look at this scraper that I copied and adapted accordingly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow