Need help on implementing LZW on a ASCII text file

https://stackoverflow.com/questions/23247693

08-07-2023
|

Question

i'm trying to implement an LZW to compress an ASCII based text file and i need help.

Let's say i have a text file with a "BABAABAAAA" written on it, and i use my code to compress it using LZW algorithm. The output of the result will be written on a text file, which results

66|65|256|257|65|260|65

Now the problem is, the compressed file would have a bigger size than the original file because it counts each char of the codes as a single char rather than counting each code as a single integer. So instead of reading 65 as 01000001 instead it thinks of 65 as 2 ascii char which is 00110110 (6) 00110101 (5). And that's not even counting the marker ('|').

Solution

After you convert some text into series of integers, like 66|65|256, you should not write it to a file as a series of 9 characters in text mode. You should open the output file in binary mode and write the numbers in binary (and not write seperation marks like | at all).

In case of LZW there are two approaches: either you use fixed binary width, and thus limit the size of dictionary, or implement (a little bit harder) variable width version, in which each number uses as many binary digits as the NEXT free dictionary key.

Back to your example, 66|65|256|257|65|260|65 with fixed width of 9 bits gives you 7 * 9 = 63 bits, which is 8 bytes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow