Question

I have difficulties with how to use RLE on sequences of symbols.

For example, I can do RLE encoding on strings like

"ASSSAAAEERRRRRRRR" 

which will be transformed to:

"A1S3A3E2R8".

But I'd like to perform RLE on strings like

"XXXYYYYY(1ADEFC)(EDCADD)(1ADEFC)(1ADEFC)(1ADEFC)"

which will be transformed to:

"X3Y5(1ADEFC)1(EDCADD)1(1ADEFC)3"

Is there is a way to reach it? This job becomes a bit easer because long strings always follows in brackets. Could give an advice to do this in C++?
If there is a better way to store values than using brackets, it will be also great if you recommend me.

Was it helpful?

Solution

You should break down this problem into smaller parts. First, you should have a function that tokenizes your stream and returns each individual part. For this example input stream:

"XXXYYYYY(1ADEFC)(EDCADD)(1ADEFC)(1ADEFC)(1ADEFC)"

this function will return the following elements, one per call:

X
X
X
Y
Y
Y
Y
Y
(1ADEFC)
(EDCADD)
(1ADEFC)
(1ADEFC)
(1ADEFC)
<eof>

If you get this function correctly implemented, then the RLE algorithm that you already implemented for single characters should be easily adapted to support longer strings.

OTHER TIPS

Since you mention your intention is to RLE encode the data to later use gzip compression and achieve better compression, my answer is don't bother encoding it first. gzip compression uses DEFLATE, which is a generalization of run-length encoding that can take advantage of runs of strings of characters. You won't get better compression for applying the same algorithm twice, and in fact you may even loose compression a bit.

If you insist in performing your own RLE, then it may be better to store the set length instead of using parenthesis. That is, instead of (1ADEFC)3 use 61ADEFC3. Also note that you intend to compress pixels, which use the full range of byte values. Keep that in mind, as an algorithm written to work with strings would not be appropiate for raw data with embedded nulls and non-printable characters all around.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top