You want the stride between scanlines (in terms of bytes) to be a DWORD multiple. What you are doing right now is based entirely on the number of pixels (at least the if (w & 3)
branch suggests this) and not the number of bytes-per-pixel. I would expect to see a test for (w * 3) % 4
if you have a 24-bpp pixel. If this value is > 0, then you need to add that many bytes (per-scanline) to satisfy alignment.
Try replacing your branch:
if (w & 3) {
...
}
With something more along the lines of this (24-bpp image):
int scanline_padding = (w * 3) % 4; // This will be a value from 0-3
// DWORD alignment not satisfied, for each scanline add [scanline_padding] bytes
if (scanline_padding > 0) {
for(unsigned i = 0; i < h; i++) {
fwrite(&rgbdata[0] + (i * 3 * w),sizeof(char)*3,w,bmp); // Nothing special here
fwrite("\0\0\0", scanline_padding, 1, bmp); // Now for the magic
}
}
// DWORD alignment was satisfied, so we can write the entire thing all at once
else {
fwrite(&rgbdata[0], w*3, h, bmp);
}
This is untested, but should work, or should at least give you some general direction...