Question

I have a list of URLs stored in a variable href. When I pass it through the below function, the only returned recipe_links come from the first URL in href. Are there any glaring errors with my code? I'm not sure why it wouldn't loop through all 20 URLs I have stored in href. The returned results that I get for the first URL in href are retrieved as expected, but I can't get the loop to the next URL.

def first_page_links(link):
    recipe_links = []
    recipe_html = []

    for x in link: 
        page_request = requests.get(x)
        recipe_html.append(html.fromstring(page_request.text))

        print recipe_html

        for x in recipe_html:
            recipe_links.append(x.xpath('//*[@id="content"]/ul/li/a/@href'))

            return recipe_links
Was it helpful?

Solution 2

Try pushing out your second loop and your return line so that no redundant iteration happens and the final list is properly returned, something like the following:

from lxml import html
import requests as rq

def first_page_links(links):

    recipe_links = []
    recipe_html = []

    for link in links:
        r = rq.get(link)
        recipe_html.append(html.fromstring(r.text))

    for rhtml in recipe_html:
        recipe_links.append(rhtml.xpath('//*[@id="content"]/ul/li/a/@href'))

    return recipe_links

Let us know if this works.

EDIT:

Consider the following:

y_list = []
final_list = []
for x in x_list:
    y_list.append(x)
    for y in y_list:
        final_list.append(y)

This is your function, simplified. Assuming in x_list you have 3 URLs, what happens is the following:

  1. x1 is appended to y_list.
  2. y_list is processed with only x1 so far, so x1 alone is appended to final_list. final_list now contains: [x1]
  3. x2 is appended to y_list.
  4. y_list now contains x1 and x2. Both are processed and appended to final_list. final_list now contains: [x1, x1, x2].
  5. x3 is appended to y_list. y_list now contains x1, x2, and x3.
  6. See where this is going? :)

Since your second loop, which processes the items in the first list, is inside the first loop, which adds incrementally to the first list, the second loop will process your first list on every iteration of the first loop. This makes it a redundant iteration.

There are many ways to execute what you wanted to do, but if you're just appending to lists and need a one-pass loop on both, the above fix was all that's needed.

OTHER TIPS

Watch where the return is placed. You probably want to return after all the loops are finished:

def first_page_links(link):
    recipe_links = []
    recipe_html = []

    for x in link: 
        page_request = requests.get(x)
        recipe_html.append(html.fromstring(page_request.text))

        print recipe_html

        for x in recipe_html:
            recipe_links.append(x.xpath('//*[@id="content"]/ul/li/a/@href'))

    return recipe_links
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top