Question

I'm doing some work where I'm programmatically downloading icons from sites specified in an OpenSearch document and I need to extract the first image (for now) if it is in ICO format. I'm able to read the ICO file header with no problems and slice out the first image file. However, after reading the Wikipedia entry explaining the file format I've discovered that, if the image is in bitmap format, then the file is incomplete (it's missing the header). So I need to reconstruct this header before I can save the data to a file, but I'm having a bit of difficulty.

According to the Wikipedia entry for BMP file format, the header is 14 bytes long and should contain the following:

Offset    Data
0x0000    "BM", for our intents and purposes
0x0002    Size of the bitmap file in bytes
0x0006    Dependant on the application creating the file
0x0008    Dependant on the application creating the file
0x000A    Offset of the image data/pixel array

I figured that the size of the bitmap file in bytes would be the size of the extracted image + the 14 bytes for the header, but I'm unsure what to write at 0x0006, 0x0008 and how to get the location of the pixel array to write at 0x000A.

I've read the article a few times, but I must admit my head is hurting a little. This is my first experience at doing this sort of thing. Can anybody help me work out how to get the pixel array location?

Was it helpful?

Solution

0x0006 and 0x0008 are reserved, you should simply put zeros there. As to 0x000A, that's the position at which the actual image data starts in the file. Normally, the header you have here is followed by the DIB header (starting at offset 0x000E) and the first four bytes of the DIB header are its size. So you take the size of the DIB header, add its starting offset (0x000E) and what you've got is the position where the actual data starts - put that at position 0x000A.

Here is example data from a random bitmap file:

42 4D             "BM"
2E 78 08 00       Size of the entire bitmap file (0x8782E meaning 555054 bytes)
00 00             creator1, reserved
00 00             creator2, reserved
36 00 00 00       Image data starts at offset 0x36 because the next 0x28 bytes are DIB header
28 00 00 00       DIB header started and its size is 0x28 (40 bytes)
another 36 bytes
FF FF FF          First pixel of the image (white as it happens)

If you take the favicon on serverfault.com as an example, you would take the part of the file between offset 0x0016 and 0x013E and prepend it with with 42 4D 36 01 00 00 00 00 00 00 36 00 00 00. Which gives you a sort of correct bitmap file - and IrfanView will even display it. However, data stored in ICO files and BMP files is not quite the same because ICO files need to store transparency information. Which is why this favicon has size 16x32 according to its DIB header rather than the expected 16x16.

From Wikipedia:

Images with less than 32 bits of color depth follow a particular format: the image is encoded as a single image consisting of a color mask (the "XOR mask") together with an opacity mask (the "AND mask"). The XOR mask must precede the AND mask inside the bitmap data; if the image is stored in bottom-up order (which it most likely is), the XOR mask would be drawn below the AND mask.

In our particular case this means that from 256 bytes of image data the first 64 bytes are the XOR mask, the last 64 bytes the AND mask and only the middle part is our image. In our particular case you could change start of image data (offset 0x000A) to 0x76 to skip the XOR mask. Then you would also change image height in the DIB header (offset 0x0016) to 0x10 to make sure the AND mask is ignored. Here these manipulations will give you a valid bitmap, pretty much like what you expected. In general case it might be better to consider the masks however rather than ignore them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top