Question

I am looking for a robust, efficient data compression algorithm that I could use to provide a real-time transmission of medical data (primarily waveforms - heart rate, etc.).

I would appreciate any recommendations/links to scientific papers.

EDIT: The system will be based on a server (most probably installed within point-of-care infrastructure) and mobile devices (iOS & Android smartphones and tablets with native apps), to which the waveforms are going to be transferred. The server will gather all the data from the hospital (primarily waveform data). In my case, stability and speed is more important than latency.

That's the most detailed specification I can provide at the moment. I am going to investigate your recommendations and then test several algorithms. But I am looking for something that was successfully implemented in similar architecture. I also am open to any suggestions regarding server computation power or server software.

Was it helpful?

Solution

Don't think of it as real-time or as medical data - think of it as packets of data needing to be compressed for transmission (most likely in TCP packets). The details of the content only matter in choice of compression algorithm, and even there it's not whether it's medical it's how the data is formatted/stored and what the actual data looks like. The important things are the data itself and the constraints due to the overall system (e.g. is it data gathering such as a Holter monitor, or is it real-time status reporting such as a cardiac monitor in an ICU? What kind of system is receiving the data?).

Looking at the data, is it being presented for transmission as raw binary data, or is it being received from another component or device as (for example) structured XML or HL7 with numeric values represented as text? Will compressing the original data be the most efficient option, or should it be converted down to a proprietary binary format that only covers the actual data range (are 2, 3 or 4 bytes enough to cover the range of values?)? What kind of savings could be achieved by converting and what are the compatibility concerns (e.g. loss of HL7 compatibility).

Choosing the absolutely best-compressing algorithm may also not be worth much additional work unless you're going to be in an extremely low-bandwidth scenario; if the data is coming from an embedded device you should be balancing compression efficiency with the capabilities and limitations of the embedded processor, toolset and surrounding system for working with it. If a custom-built compression routine saves you 5% over something already built-in to the tools is it worth the extra coding and debugging time and storage space in embedded flash? Existing validated software libraries that produce "good enough" output may be preferred, particularly for medical devices.

Finally, depending on the environment you may want to sacrifice a big chunk of compression in favor of some level of redundancy, such as transmitting a sliding window of the data such that loss of any X packets doesn't result in loss of data. This may let you change protocols as well and may change how the device is configured - the difference between streaming UDP (with no retransmission of lost packets) and TCP (where the sender may need to be able to retransmit) may be significant.

And, now that I've blathered about the systems side, there's a lot of information out there on packetizing and streaming analog data, ranging from development of streaming protocols such as RTP to details of voice packetization for GSM/CDMA and VOIP. Still, the most important drivers for your decisions may end up being the toolsets available to you on the device and server sides. Using existing toolsets even if they're not the most efficient option may allow you to cut your development (and time-to-market) times significantly, and may also simplify the certification of your device/product for medical use. On the business side, spending an extra 3-6 months of software development, finding truly qualified developers, and dealing with regulatory approvals are likely to be the overriding factors.

UPDATE 2012/02/01: I just spent a few minutes looking at the XML export of a 12-lead cardiac stress EKG with a total observation time of 12+ minutes and an XML file size of ~6MB. I'm estimating that more than 25% of that file was repetitive and EXTREMELY compressible XML in the study headers, and the waveform data was comma-separated numbers in the range of -200 to 200 concentrated in the center of the range and changing slowly, with the numbers crossing the y-axis and staying on that side for a time. Assuming that most of what you want is the waveform values, for this example you'd be looking at a data rate with no compression of 4500KB / 763 seconds or around 59 Kbps. Completely uncompressed and using text formatting you could run that over a "2.5G" GPRS connection with ease. On any modern wireless infrastructure the bandwidth used will be almost unnoticeable.

I still think that the stock compression libraries would eat this kind of data for lunch (subject to issues with compression headers and possibly packet headers). If you insist on doing a custom compression I'd look at sending difference values rather than raw numbers (unless your raw data is already offsets). If your data looks anything like what I'm reviewing, you could probably convert each item into a 1-byte value of -127 to +127, possibly reserving the extreme ends as "special" values used for overflow (handle those as you see fit - special representation, error, etc.). If you'd rather be slightly less efficient on transmission and insignificantly faster in processing you could instead just send each value as a signed 2-byte value, which would still use less bandwidth than the text representation because currently every value is 2+ bytes anyway (values are 1-4 chars plus separators no longer needed).

Basically, don't worry about the size of the data unless this is going to be running 24/7 over a heavily metered wireless connection with low caps.

OTHER TIPS

There is a category of compression software which is so fast that i see no scenario in which it can't be called "real time" : they are necessarily fast enough. Such algorithms are called LZ4, Snappy, LZO, QuickLZ, and reach hundreds of MB/s per CPU.

A comparison of them is available here : http://code.google.com/p/lz4/

"Real Time compression for transmission" can also be seen as a trade-off between speed and compression ratio. More compression, even if slower, can effectively save transmission time.

A study of the "optimal trade-off" between compression and speed has been realized on this page for example : http://fastcompression.blogspot.com/p/compression-benchmark.html

I tested many compression libraries and this is my conclusion

LZO (http://www.oberhumer.com/opensource/lzo/) is very fast considering compressing big amount of data (more than 1 MB)

Snappy (http://code.google.com/p/snappy/) is good but requires more processing resources at decompresion (better for data less than 1MB)

http://objectegypt.com is offering a library called IHCA which is faster than lzo in big data compression and offers a good decompression speed and requires no license

finally you'd better make your own compression functions, because no one knows about your data more than you

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top