The space complexity of recognising Watson-Crick palindromes

https://cs.stackexchange.com/questions/1223

16-10-2019
|

Question

I have the following algorithmic problem:

Determine the space Turing complexity of recognizing DNA strings that are Watson-Crick palindromes.

Watson-Crick palindromes are strings whose reversed complement is the original string. The complement is defined letter-wise, inspired by DNA: A is the complement of T and C is the complement of G. A simple example for a WC-palindrome is ACGT.

I've come up with two ways of solving this.

One requires $\mathcal{O}(n)$ space.

Once the machine is done reading the input. The input tape must be copied to the work tape in reverse order.
The machine will then read the input and work tapes from the left and compare each entry to verify the cell in the work tape is the compliment of the cell in the input. This requires $\mathcal{O}(n)$ space.

The other requires $\mathcal{O}(\log n)$ space.

While reading the input. Count the number of entries on the input tape.
When the input tape is done reading
- copy the complement of the letter onto the work tape
- copy the letter L to the end of the work tape
(Loop point)If the counter = 0, clear the worktape and write yes, then halt
If the input tape reads L
- Move the input head to the left by the number of times indicated by the counter (requires a second counter)
If the input tape reads R
- Move the input head to the right by the number of times indicated by the counter (requires a second counter)
If the cell that holds the value on the worktape matches the current cell on the input tape
- decrement the counter by two
- Move one to the left or right depending if R or L is on the worktape respectively
- copy the Complement of L or R to the worktape in place of the current L or R
- continue the loop
If values dont match, clear the worktape and write no, then halt

This comes out to about $2\log n+2$ space for storing both counters, the current complement, and the value L or R.

My issue

The first one requires both linear time and space. The second one requires $\frac{n^2}{2}$ time and $\log n$ space. I was given the problem from the quote and came up with these two approaches, but I don't know which one to go with. I just need to give the space complexity of the problem.

The reason I'm confused

I would tend to say the second one is the best option since it's better in terms of time, but that answer only comes from me getting lucky and coming up with an algorithm. It seems like if I want to give the space complexity of something, it wouldn't require luck in coming up with the right algorithm. Am I missing something? Should I even be coming up with a solution to the problem to answer the space complexity?

Solution

Disclaimer: The following algorithm does not use the standard model for sublinear space complexity (see WP:DSPACE), because it writes to the input tape. Furthermore the set of (Watson-Crick) palindromes is not in $\mathsf{DSPACE}(\mathcal{O}(1)) = \mathsf{REG}$. Nevertheless, it's an in-place solution for many practical purposes (e.g. if each letter is a char in C).

To show that a problem has a specific space complexity, one generally needs to come up with an algorithm that has that space complexity. This may require trial and error, but a better approach is to have a good understanding of the problem you are looking at and a good amount of experience in algorithms and complexity.

For this particular example, there is a third algorithm that does not need any additional space and requires $O(n^2)$ time complexity. This algorithm would be constant space.

Hint: why use additional space, when you can use the space occupied by the input?

Hint: zip back-and-forth across the string checking one character at a time – use the state of the Turing Machine to remember which character you are checking and erase characters you've already checked.

OTHER TIPS

The way the question is asked you should come up with an upper bound and a lower bound on the space complexity.

The first part is usually done by presenting an algorithm for your problem but a reduction to some other well studied problem would also work (and indirectly yield an algorithm, too). Since you don't consider a combined complexity (time and space) only the space matters, even if the time increases. So you found an upper bound of $\mathcal{O}(\log n)$.

The second part is usually a lot trickier, but you can easily show that constant space is not enough, because this would make your language regular. Using the pumping lemma with, say $a^lb^{2l}a^l$, where $l$ is the assumed pumping number will do the trick. This still leaves a large gap between $\omega(1)$ and $\mathcal{O}(\log n)$.

I found an exercise (see Exercise 6) which gives some hints. If I interpret their hint correctly, try proving that there are to many different equivalence classes of the Myhill-Nerode relation for each input size. This is similar to the argument that you can't distinguish more than $c\cdot\Gamma^{s(n)}$ strings of length $n$ (where $\Gamma$ is your tape alphabet and $s(n)$ your space complexity). This will give you a lower bound of $\Omega(\log n)$.

Note that you don't need to care about the complement of letters since this operation is trivial, so everything that works for ordinary palindromes can be modified with a few more states to solve your problem.

It is extremely likely that the classical time-space tradeoff for palindromes holds also in your case. This result states that a Turing machine recognizing palindromes in space $S$ must take time $\Omega(n^2/S)$, i.e. $$ \textrm{Time} \times \textrm{Space} = \Omega(n^2). $$ In your case, the first algorithm has $TS = \Theta(n^2)$, and the second has $TS = \Theta(n^2\log n)$. One can also show that the lower bound for space is $\Omega(\log n)$. I haven't been able to found what is the best upper bound - that is, whether there is an algorithm using logarithmic space and time $O(n^2/\log n)$, or in general, for what values of $S$ can we achieve time $\Omega(n^2/S)$. As an exercise, you can try finding other, hybrid algorithms interpolating your two algorithms.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange