Question

Here is the code I have tried. The files are 0 bytes. I've also set imagedata=br.download(...) and it reports 0 for len(). I've been at this for hours... any ideas?

pre_record_soup='[<img src='/show_pic.php?id=316600'>]' #simplified

def func_get_pic(pre_record_soup, br=spynner.Browser()):
    baseurl='http://www.testsite.com/'

    for record in pre_record_soup:
        imagetag=record.find('img')
        filename = 'image.jpg' #set name of file afterdownload

        try:
            if imagetag:
                piclink = imagetag.find('img')['src']
            else:
                piclink = 'basicimages/icons/icon.gif'
                filename = 'icon.gif'
        except TypeError:
            return None

        print baseurl+piclink #this prints the expected link
        print filename #this prints the filename I want

        with open('/home/myhome/'+filename, 'wb') as handle:
            br.download(baseurl+piclink,handle) #not retrieving image...

I'm also calling this function within an authenticated session from spynner. So spynner logs me into a website, and I scrape this and other data. The other data (text) scrapes fine. Additionally, when I visit the image URL in a browser it properly displays the jpeg file.

Thanks for any help!

edit-10 March 2014//Here is the debug message spynner gives me. Note the correctly formatted url for the php-served image, and the lack of "Read from download stream" that is present in the correctly downloaded .gif:

http://www.testsite.com/show_pic.php?id=81851
Request: GET http://www.testsite.com/show_pic.php?id=81851
Start download: http://www.testsite.com/show_pic.php?id=81851
Download finished: http://www.testsite.com/show_pic.php?id=81851
http://www.testsite.com/basicimages/icons/icon.gif
Request: GET http://www.testsite.com/basicimages/icons/icon.gif
Start download: http://www.testsite.com/basicimages/icons/icon.gif
Read from download stream (419 bytes): http://www.testsite.com/basicimages/icons/icon.gif
Download finished: http://www.testsite.com/basicimages/icons/icon.gif

Additional info-debug stream from a br.load attempt. Note that the content-length is 0 bytes. This loads FINE in Firefox... UGH!

Page load started
Request: GET http://www.testsite.com/show_pic.php?id=81851
  Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.21 (KHTML, like Gecko)     Qt/4.8.4 Safari/537.21
Reply: 200/OK - http://www.testsite.com/show_pic.php?id=81851
  Date: Tue, 11 Mar 2014 01:16:35 GMT
  Server: Apache
  Set-Cookie: PHPSESSID=abvcv4j6hbu57a638tc8pg8i77b19bl0; path=/
  Content-Length: 0
  Connection: close
  Content-Type: text/html
Page load finished (39 bytes): http://www.testsite.com/show_pic.php?id=81851 (successful)
Was it helpful?

Solution 2

Answer:

Calling a function from outside the same code that logs into testsite opens a different browser. The code for func_get_pic, copied and pasted into the login function, works fine. That's the workaround until I figure out how to pass login session from one function to another.

OTHER TIPS

According to your code, after parsing your piclink has:

http://www.testsite.com/show_pic.php?id=316600

And now you are doing baseurl+piclink which means:

http://www.testsite.com/http://www.testsite.com/show_pic.php?id=316600

So you now know where is the error. Adjust the url accordingly and it will solve your problem!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top