Question

Let $S$ be a set (say positive integers $\leq$ N) and $f$ an involution ($f$ is bijective, $f\cdot f=id$, e.g. xor with a constant). $g$ is a idempotent mapping choosing an arbitrary representative element in each $f$ mapped pairs. For example $g(x)=\min (x,f(x))$ $$g: S \rightarrow \tilde S \subseteq S$$

I want to build a compact lookup table from $g$'s codomain $\tilde S$ to any problem data, taking $|\tilde S| \leq |S|$ cells in memory. Ideally I wish to construct a bijective mapping $\tilde S \rightarrow \left[0, |\tilde S| \right[$.

Can it be done efficiently in general (without hash map or scanning) ? What properties of the involution $f$ could help with that ?

Edit: I formulated the problem in the more general/formalized, hoping for a generic solution. Following D.W.'s comment, I'll give a concrete application:

I work with DNA words of $k$ nucleotides bases called $k$-mers. Since there is four bases, $k$-mer are represented as elements of $S=[0,2^{2k}[$

However DNA can be read on both strands, with opposite orientations and complementary bases ($A \leftrightarrow T$, $G \leftrightarrow C$). Going from one strand to the other can be represented by this involution (reverse-complement, here for 5-mers): $$ f(x) = \text{reverse}_2(x) \oplus 0\text b1010101010 $$ where $\text{reverse}_2(abcdefghij)=ijghefcdab$ inverse the order of 2-bits blocks and $\oplus$ is the bitwise XOR.

Since many applications don't distinguish between a $k$-mers and its reverse-complement, a canonical $k$-mer is picked with $g(x)=\min (x,f(x))$. The cardinality of $g$'s co-domain is: $$ \left|\tilde{S}\right|=\begin{cases} 2^{2k-1} & \text{if }k\text{ is odd}\\ 2^{k-1}\left(2^{k}+1\right) & \text{if }k\text{ is even} \end{cases} $$

In practice saving less than one addressing bit is not worth a complex solution. But cache locality is a good thing to have. $g$ can be chosen differently if it help with that.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top