Why don't prefix-free Turing machines suffer from complexity dips?

https://cs.stackexchange.com/questions/118695

28-09-2020
|

Question

It's claimed in several texts on algorithmic complexity that prefix-free Turing machines are better for understanding randomness, at least in infinite sequences. In Nies' Computability and Randomness, the reason is given by theorems 2.2.1 and 2.2.2 on p. 83. I'll focus on the former, which states that for some plain (not prefix-free) machine there is a constant $c$ such that for each natural number $d$ and string $w$ of length $\geq 2^{d+1}+d$, there is a prefix $x$ of $w$ such that the plain complexity $C(x) \leq |x|-d+c$. (Here $|x|$ is the length of $x$.)

That is, if complexity is defined in terms of Turing machines that accept arbitrary strings as inputs, any sufficiently long string, no matter how complex, can have initial substrings with more or less arbitrary "dips" in complexity. The proof of the theorem uses a machine that uses the length of an input string to construct its output. This is apparently an important point (see below).

My question: Why can't a prefix-free machine do this, too? Why doesn't prefix-free complexity also allow dips in complexity of initial substrings? I have not yet found an explanation of this point that I understand in the algorithmic complexity textbooks by Nies, Downey and Hirschfeldt (see below), Li and Vitanyi (although it may be there somewhere), or Calude (Computability and Randomness). I think that Nies and D&H just think it's obvious, but I don't see why.

Some prefix machines: Downey and Hirschfeldt's Algorithmic Randomness and Complexity, p. 122, refers to a similar theorem proved earlier in the book, and remarks that a prefix-free machine can be thought of as one that reads an input until its finished, without moving in the opposite direction on the tape, and without any test for a termination-of-input character or pattern. The text says that "this highlights how using prefix-free machines circumvents the use of the length of a string to gain more information than is present in the bits of a string." I guess that if there is a single tape that's read-only and moves in only one direction, then there is no way to keep a counter to measure arbitrary lengths; one would need to store a counter elsewhere on the tape, or at least overwrite something on the tape as one read it. But why must a prefix-free machine work like this? Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed, sect. 3.1, example 3.1.1, p. 201 describes a prefix-free machine that has a three tapes. There's a read-only unidirectional tape as in Downey and Hirschfeldt, and a unidirectional write-only output tape. The third tape is a bidirectional read-write work tape. Can't the work tape be used to calculate the length of the input? In that case, why would there be a difference between prefix-free machines and plain Turing machines? Yet at the beginning of chapter 3 (pp. 197ff), Li and Vitanyi also treat prefix-free machines as a way of avoiding consequences related to those implied by Nies' theorem 2.2.1.

For those who would find it helpful, Nies' theorem 2.2.1 is below. (I decided to reproduce it nearly verbatim rather than using my own notation to avoid accidentally introducing a distortion.) The proof works by showing that there is a plain machine that can use (an encoding of) the length of an input string as part of the output string. This allows the machine to generate a relatively long string using a relatively short input--because the machine uses both the contents of the input and its length as sources to construct the output string. As I see it, all that's needed for this trick to work is that we allow machines to calculate lengths of input strings. Since it seems to me that a prefix-free machine (with some kind of work tape area) can calculate the length of an input this just as easily as a plain machine can, I don't see why this theorem doesn't hold for prefix-free complexity as well.

Proposition. There is a constant $c$ with the following property. For each $d \in \mathbb{N}$ and each string $w$ of length at least $2^{d+1}+d$ there is an $x \preceq w$ such that $C(x) \leq |x|-d+c$.

Proof. [elided: reference to earlier notation def.] The machine $N$ is given by $N(\sigma) = \mathsf{string}(|\sigma|)\sigma$. It is sufficient to obtain a prefix $x$ of $w$ with an $N$-description of length $|x|-d$. Let $k=\mathsf{number}(w\!\upharpoonright_d)$ (so that $k \leq 2^{d+1}$), and let $x=w\!\upharpoonright_{d+k}$ be the prefix of $w$ with the length given by $d+k$ where $k$ is the number represented by the first $d$ bits of $w$. Let $\sigma$ be the string of length $k$ such that $x=x\!\upharpoonright_d \sigma$, then $N(\sigma)=x$. Thus $C_N(x) \leq |x|-d$.
(Nies 2009, p. 83)

Notation: $C$: plain complexity. $|x|$: length of bit string $x$. $x\preceq w$: $x$ is a prefix of $w$. $\mathsf{string}(n)$: encoding of natural number $n$ as a bit string. $\mathsf{number}()$: inverse of $\mathsf{string}()$. (The claim that $k \leq 2^{d+1}$ follows from the definition of $\mathsf{number}$.) $\mathsf{string}(|\sigma|)\sigma$ is the concatenation of $\mathsf{string}(|\sigma|)$ and $\sigma$. $w\!\upharpoonright_d$: prefix of $w$ consisting of its first $d$ bits. (I'm pretty certain that's what $\upharpoonright$ means, but the definition on pp. 12 and 16 is only 95% clear to me. My interpretation of $\upharpoonright$ makes sense in this proof, though, and I can't see how any other interpretation would make sense given how it's used.)

Note: "given by $d+k$ where $k$" in the middle sentence of the proof was "given by $d+m$ where $m$" in the original; I believe "$m$" is a typo. ($m$ appears nowhere else on this page, nor near this page.)

Solution

Answering the titular question, the proof of the proposition breaks down for prefix-free encodings, since $\sigma$ doesn't necessarily belong to the prefix-free code.

There are other ways of seeing that prefix-free programs are more well-behaved than plain programs. Suppose that we want to concatenate the output of two programs $P,Q$. With prefix-free programs, we can take a universal program that runs two input programs and concatenates their outputs, for a total length of $|P| + |Q| + O(1)$. With plain programs, this is impossible, since we need to separate the two programs $P,Q$ somehow, say by encoding them as $\mathit{len}(P)PQ$, where $\mathit{len}(P)$ is encoded using a self-terminating encoding. Thus we have to lose an additive term of $O(\log \min(|P|,|Q|))$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange