سؤال

I want to get the image dimensions as seen from a viewer on a website.

I'm using beautiful soup and I get image links like this:

links = soup.findAll('img', {"src":True})

The way I get the image dimensions is by using:

link.has_key('height')
height = link['height']

and similarly with width as well. However, some links only have one of these attributes. I tried PIL but that gives the actual image size if downloaded.

Is there any other way of finding the image dimensions as seen on a website?

هل كانت مفيدة؟

المحلول

Your main issue is you are searching the html source for references to height and width. In most cases (when things are done well), images don't have height and width specified in html, in which case they are rendered at the height and width of the image file itself.

To get the height and width of the image file, you need to actually query for and load that file, then check the height and width using image processing. If this is what you want, let me know, and I'll help you work through that process.

import urllib, cStringIO
from PIL import Image

# given an object called 'link'

SITE_URL = "http://www.targetsite.com"
URL = SITE_URL + link['src']
# Here's a sample url that works for demo purposes
# URL = "http://therealtomrose.therealrosefamily.com/wp-content/uploads/2012/08/headshot_tight.png"
file = cStringIO.StringIO(urllib.urlopen(URL).read())
im=Image.open(file)
width, height = im.size
if link.has_key('height'):
    height = link['height']  # set height if site modifies it
if link.has_key('width'):
    width = link['width']  # set width if site modifies it

Requirements: This method requires the PIL library for image processing.

# from command line in a virtual environment
pip install PIL
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top