There are multiple solutions generally possible, but you need to specify the format of your images - grayscale what? 8 bits? 12 bits? 16 bits?
Most other answers completely miss the mark by ignoring the physical reality of what you're trying to do: the bandwidth, both in terms of I/O and processing, is of primary importance.
Did you verify the storage bandwidth available on your system, in realistic conditions? It will be generally a bad idea to store this stream on the same drive your operating system lives on, because the seeks due to requirements of other applications will eat into your bandwidth. Remember that on a modern 50+Mbyte/s hard drive with 5ms seeks, one seek costs you 0.25MBytes of bandwidth, and that's rather optimistic since modern "run of the mill" hard drives read faster and seek slower, on average. I'd say 1MByte lost per each seek is a conservative estimate on yesteryear's consumer drives.
If you need to write raw frames and don't want to compress them even in a lossless fashion, then you need a storage system that can support the requisite bandwidth. Assuming 8 bit grayscale, you'll be dumping 2Mbytes/frame, at 50Hz that's 100Mbytes/s. A striped RAID 0 array of two contemporary off-the-shelf drives should be able to cope with it without problems.
If you are OK with burning some serious CPU or GPU for compression, but still want lossless storage, then JPEG2000 is the default choice. If you use a GPU implementation, it will leave your CPU alone for other things. I'd think the expected bandwidth reduction is 2x, so your RAID 0 will have plenty of bandwidth to spare. That would be the preferred way to use it - it will be very robust and you won't be losing any frames no matter what else the system is doing (within reason, of course).
If you are OK with lossy compression, then off-the-shelf jpeg libraries will do the trick. You'd probably want a 4x reduction in size, and the resultant 12.5Mbytes/s data stream can be handled by the hard drive the OS lives on.
As for the implementation: two threads are enough if there's no compression. One thread captures the images, another one dumps them to the drive. If you see no improvement compared to a single thread, then it's solely due to the bandwidth limitations of your drive. If you use GPU for compression, then one thread that handles compression is enough. If you use CPU for compression, then you need as many threads as there are cores.
There is no issue at all with storing image differences, in fact JPEG2k loves this and you my get an overall 2x compression improvement (for a total factor of 4x) if you're lucky. What you do is store a buch of difference frames for each reference frame stored in full. The ratio is based solely on the needs of the processing done afterwards - you're trading off resilience to data loss and interactive processing latency for decreased storage-time bandwidth.
I'd say anywhere between 1:5 and 1:50 ratio is reasonable. With the latter, the loss of the reference frame knocks out 1s worth of data, and randomly seeking anywhere in the data requires on average a read of a reference frame and 24 delta frames, plus the cost of decompressing 25 frames.