Question

I find this a bit difficult to describe, but I am interested in the following idea :

The LZ algorithm factors (verb) an input stream into adjacent factors, these are by definition the maximal prefixes of the piece of text that occur in the previous text (or equivalently the previous concatenation of LZ factors).

I know and believe that in the long term (given infinite input and infinite window) this coding scheme can achieve the Shannon limit, that it will find all repeat patterns that exist.

However in any given finite text (but with an unbounded window) how optimal is this?

Do the choice of factors earlier in the input have potential detrimental effects later on? For instance, could LZ converge to a choice of factors that omits certain larger factors, or factor-choices that would result in a better cover of the input (i.e. a choice of factors that cover more of the text?).

Or is the optimality of LZ only constrained by the window limit, and the finite nature of a text? Please provide some kind of hand waving or intuitive proof.

Was it helpful?

Solution

If you apply the LZ77 algorithm twice, it doesn't find any new repeat patterns the second time. You can show that if there were any repeat pattern present after the LZ77 algorithm has been applied. this would have arisen from a repeat pattern in the original data which LZ77 would have found.

The DEFLATE algorithm follows LZ77 with Huffman coding, and does better for finite strings than LZ77 alone. So LZ77 can be improved upon for finite strings fairly easily, just not by using another iteration of LZ77.

However, the Huffman algorithm doesn't use repeat patterns. If you're asking the question of whether, for finite strings, the LZ77 algorithm does as well as any other algorithm based solely on finding repeat patters, I don't know if anybody has looked at this. The problem isn't quite well-defined as stated, but somebody might be able to prove some interesting theorems about this.

OTHER TIPS

The book Elements of information theory by Cover and Thomas contains some information on the Lempel-Ziv algorithm, including a proof of its asymptotic optimality (see Chapter 13).

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top