Question

Given an HTML page with a news article I'm trying to detect the relevant image(s) from the article. For this, I'm looking at the sizes of the images (if they're too small likely they are navigational elements), but I don't want to download every image.

Is there an way to get the width and height of the image without downloading the full image?

Was it helpful?

Solution

Don't know if it'll help you speed up your application, but it can be done. Checkout these two articles:

http://www.anttikupila.com/flash/getting-jpg-dimensions-with-as3-without-loading-the-entire-file/ for JPEG

http://www.herrodius.com/blog/265 for PNG

They are both for ActionScript, but the principle applies for other languages as well of course.

I made a sample using C#. It's not the prettiest code and it only works for JPEGs, but can be easily extended to PNG too:

var request = (HttpWebRequest) WebRequest.Create("http://unawe.org/joomla/images/materials/posters/galaxy/galaxy_poster2_very_large.jpg");
using (WebResponse response = request.GetResponse())
using (Stream responseStream = response.GetResponseStream())
{
    int r;
    bool found = false;
    while (!found && (r = responseStream.ReadByte()) != -1)
    {
        if (r != 255) continue;

        int marker = responseStream.ReadByte();

        // App specific
        if (marker >= 224 && marker <= 239)
        {
            int payloadLengthHi = responseStream.ReadByte();
            int payloadLengthLo = responseStream.ReadByte();
            int payloadLength = (payloadLengthHi << 8) + payloadLengthLo;
            for (int i = 0; i < payloadLength - 2; i++)
                responseStream.ReadByte();
        }
        // SOF0
        else if (marker == 192)
        {
            // Length of payload - don't care
            responseStream.ReadByte();
            responseStream.ReadByte();

            // Bit depth - don't care
            responseStream.ReadByte();

            int widthHi = responseStream.ReadByte();
            int widthLo = responseStream.ReadByte();
            int width = (widthHi << 8) + widthLo;

            int heightHi = responseStream.ReadByte();
            int heightLo = responseStream.ReadByte();
            int height = (heightHi << 8) + heightLo;

            Console.WriteLine(width + "x" + height);
            found = true;
        }
    }
}

EDIT: I'm no Python expert, but this article seems to desribe a Python lib doing just that (last sample): http://effbot.org/zone/pil-image-size.htm

OTHER TIPS

No, it is not possible. But you can get information from img tags, but not from backgrounds.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top