Question

My objective is to convert a 32-bit bitmap(BGRA) buffer into png image in real-time using C/C++. To achieve it, i used libpng library to convert bitmap buffer and then write into a png file. However it seemed to take huge time (~5 secs) to execute on target arm board (quad core processor) in single thread. On profiling, i found that libpng compression process (deflate algorithm) is taking more than 90% of time. So i tried to reduce it by using parallelization in some way. The end goal here is to get it done in less than 0.5 secs at least.

Now since a png can have multiple IDAT chunks, i thought of writing png with multiple IDATs in parallel. To write custom png file with multiple IDATs following methodology is adopted

   1. Write PNG IHDR chunk
   2. Write IDAT chunks in parallel
      i.   Split input buffer in 4 parts.
      ii.  compress each part in parallel using zlib "compress" function.
      iii. compute CRC of chunk { "IDAT"+zlib compressed data }.
      iv.  create IDAT chunk i.e. { "IDAT"+zlib compressed data+ CRC}.
      v.   Write length of IDAT chunk created.
      vi.  Write complete chunk in sequence.
   3. write IEND chunk

Now the problem is the png file created by this method is not valid or corrupted. Can somebody point out

  1. What am I doing wrong?
  2. Is there any fast implementation of zlib compress or multi-threaded png creation, preferably in C/C++?
  3. Any other alternate way to achieve target goal?

Note: The PNG specification is followed in creating chunks

Update: This method works for creating IDAT in parallel

    1. add one filter byte before each row of input image. 
    2. split image in four equal parts. <-- may not be required passing pointer to buffer and their offsets
    3. Compress Image Parts in parallel
            (A)for first image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FULL_FLUSH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --store adler32 for current chunk, {a1=zstrm->adler} <--adler is of uncompressed data
            (B)for second and third image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FULL_FLUSH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --strip first 2-bytes, reduce length by 2
                --store adler32 for current chunk zstrm->adler,{a2,a3 similar to A} <--adler is of uncompressed data
            (C) for last image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FINISH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --strip first 2-bytes and last 4-bytes of buffer, reduce length by 6
                --here last 4 bytes should be equal to ztrm->adler,{a4=zstrm->adler} <--adler is of uncompressed data

    4. adler32_combine() all four parts i.e. a1,a2,a3 & a4 <--last arg is length of uncompressed data used to calculate adler32 of 2nd arg
    5. store total length of compressed buffers <--to be used in calculating CRC of complete IDAT & to be written before IDaT in file
    6. Append "IDAT" to Final chunk
    7. Append all four compressed parts in sequence to Final chunk
    8. Append adler32 checksum computed in step 4 to Final chunk
    9. Append CRC of Final chunk i.e.{"IDAT"+data+adler}

    To be written in png file in this manner: [PNG_HEADER][PNG_DATA][PNG_END]
    where [PNG_DATA] ->Length(4-bytes)+{"IDAT"(4-bytes)+data+adler(4-bytes)}+CRC(4-bytes)
Was it helpful?

Solution

Even when there are multiple IDAT chunks in a PNG datastream, they still contain a single zlib compressed datastream. The first two bytes of the first IDAT are the zlib header, and the final four bytes of the final IDAT are the zlib "adler32" checksum of the entire datastream (except for the 2-byte header), computed before compressing it.

There is a parallel gzip (pigz) under development at zlib.net/pigz. It will generate zlib datastreams instead of gzip datastreams when invoked as "pigz -z".

For that you won't need to split up your input file because the parallel compression happens internally to pigz.

OTHER TIPS

In your step ii, you need to use deflate(), not compress(). Use Z_FULL_FLUSH on the first three parts, and Z_FINISH on the last part. Then you can concatenate them to a single stream, after pulling off the two-byte header from the last three (keep the header on the first one), and pulling the four-byte check values off of the last one. For all of them, you can get the check value from strm->adler. Save those.

Use adler32_combine() to combine the four check values you saved into a single check value for the complete input. You can then tack that on to the end of the stream.

And there you have it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top