Where does the base come from in a subsequence count?

https://cs.stackexchange.com/questions/125882

29-09-2020
|

Question

If sequence X has m symbols, then X has 2^m possible subsequences.

Where does the base (2) come from?

The origin of this question is the following paragraph:

We are given two ordered sequences of symbols, and we wish to find a longest common subsequence of $X$ and $Y$.

A subsequence of $X$ is just $X$ with some (or possibly all or none) of its elements removed. For example, one subsequence of A; B; C; D; E; F; G would be B; C; E; G.

The length of a longest common subsequence of $X$ and $Y$ gives one measure of how similar these two sequences are. For example, if the two sequences are base pairs in DNA strands, then we might consider them similar if they have a long common subsequence.

If $X$ has $m$ symbols and $Y$ has $n$ symbols, then $X$ and $Y$ have $2^m$ and $2^n$ possible subsequences respectively.

Solution

You can think of a subsequence $Y$ of $X$ as a characteristic vector $\chi_Y$ that has $m$ elements, one per element of $X$. The $i$-th element of $\chi_Y$ is $1$ if the $i$-th element of $X$ is also in $Y$ and $0$ otherwise.

It is easy to see that this is a bijection: every subsequence $Y$ has a distinct characteristic vector $\chi_Y$, and every characteristic vector $\chi_Y$ is associated to a distinct sequence $Y$.

This means that the number of subsequences $Y$ is equal to the number of vectors $\chi_Y$. Since each vector $\chi_Y$ has $m$ elements, each of which can have one of $2$ possible values, this number is $2^m$.

A concrete example of a sequence $X$, a subsequence $Y$, and its characteristic vector $\chi_Y$ is the following:

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange