문제

I'm trying to iterate over a list created by spliting the response string in the requests module my goal is to manipulate and add the captured data to a set; each page in the xrange should have exactly 40 of the value that I'm looking for but seemingly my code is taking the last value of each iteration and adding that to the list rather than each value. Consequently the loop that should perform some string addition like so: 'http://example.com' + link1 + '.html', 'http://example.com' + link2 + '.html', 'http://example.com' + link3 + '.html', ... instead returns unwanted substrings like so: 'http://example.com' + 'l' + '.html', 'http://example.com' + 'i' + '.html', 'http://example.com' + 'n' + '.html' , .... How can I change this to accomplish the goal and why did it happen.

last_pg = 10
    BASE_URL = 'http://example.com?act=view&NowPage=%s'
    urls = set()
    for i in xrange(last_pg):
        response = requests.get(BASE_URL % i)
        parsed_body = html.fromstring(response.text)
        links = response.text.split('-p-')[-1].split('-cat-')[0]
        print links #this seems to print the last value of each iteration rather than all of them
        for link in links:# this loop breaks down each link value into substrings and performs the interpolation on the substrings
            finallink = ('http://example.com-' + link.encode('ascii', 'ignore') + '.html')
            urls.add(finallink)
            print "added %s to que" % finallink
    print urls
    print len(urls)
도움이 되었습니까?

해결책

The split is returning a list, but you are using an index of that list to do the second split so you are only getting a single element from it. response.text.split('-p-') gives you a list, but response.text.split('-p-')[-1] gives you the last element of that list. If you did something like:

links = [x.split('-cat-')[0] for x in response.split('-p-')] 

you could maybe get a list of what you wanted, but you might have to do some more processing either by changing the index you get from the '-cat-' split or by doing another iteration through the list from that split.

The reason you are just getting single letters is because you are iterating through a string and not a list of strings, so it is yielding the characters from the string, instead of individual strings.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top