Question

I have an application that reads a large binary file (1GB on average) and compresses into a bzip2 archive. I started out at first compressing these files synchronously, as I didn't want to impede performance on a client machine. Sometimes however, these files come in bursts and I'd like to handle these files as quickly as possible. So I rewrote the method to use a future asynchronous call. They are stored in a vector until complete and then are destroyed.

During my stress test, I noticed that I would inevitably have an issue with CPU usage if, let's say, 5 files came in at once on a 4 core machine. The client machine would basically be unusable until all operations were complete.

So, this brings me to design question. I am inexperienced with futures and trying to determine best practice to mitigate high CPU usage. This is the design I have in mind but before I go through the trouble of hammering out the semicolons, is there a more native feature to futures that I am unaware of?

  • Determine how many CPU's are available to the host machine
  • Divide number of CPU's by half to prevent more than 50% CPU usage by the application
  • Use a loop on a separate thread to manage the future objects stored in a vector.
  • The loop detects when a object is no longer in scope, and starts the next vector future compression object

Would this be the best way to go?

Thank you!

Was it helpful?

Solution

While you may have 4 CPUs, you have only one hard-drive (unless you don't). The total performance will therefore be limited by your disk drive's read/write rate. Multiple threads isn't going to change that.

Having a single separate thread handle all the file IO will allow your application to remain responsive while still getting things done asynchronously. Having multiple threads all trying to talk to the same drive is just wasting time, making them all contend for a single resource. Odds are good that in the latter case, you'll make things worse due to contention.

If you truly want your application's usage to minimally impact the machine's functionality, then you need to focus on your hard disk usage. You want to read 1GB into memory. But that's going to hurt any application that tries to read or write to the disk. So you could use low-level, platform-specific asynchronous reading APIs to read the file in smaller chunks. Read perhaps 5MB, then have the thread sleep for a few milliseconds to let others have a go. Then read a few more.

Indeed, if your bzip compressor can handle it, you can start feeding it data that you've partially read without having to read the whole thing. That will also keep you from having to allocate and store 1GB of RAM all at once.

OTHER TIPS

If your machine gets unusable because you have enough work for 5 threads but only 4 cores available, that's either a problem with your operating system or these 5 threads run at too high a priority. You should notice a slight slowdown, that's all. And the fans running at top speed obviously :-)

Another possibility is that you are running out of RAM. Again, a compression algorithm shouldn't use a lot of RAM. If you are reading the whole gigabyte into RAM, and that 5 times, and your computer has only 4 Gigabytes, that will be a problem. In that case, change your algorithm to use less RAM. File mapping might help.

It might be that your compression algorithm uses synchronisation in case it is called multiple times (not reentrant). Worst case this synchronisation could take more time than the compression itself. Don't do that.

If you can't find a different solution, use a semaphore that allows only four or three or two or even only one of these threads to run.

Licensed under: CC-BY-SA with attribution
scroll top