Domanda

I want to compress a DNA sequence with a compression technique rather than Huffman and Adaptive Huffman algorithm, i'm using c# as a programming language. can anyone lead me to an algorithm. Note: I want a lossless compression

È stato utile?

Soluzione

With DNA sequences you have 4 possible states, namely

  • Guanine (G, 00)
  • Cytosine (C, 01)
  • Adenine (A, 10)
  • Thymine (T, 11)

You can use two bits to store those four possible states with the values in brackets. With this simple method you will be able to store four distinct values in one byte.


Update
As @kol mentioned you could then use practically any compression algorithm to further shrink the data. Currently .NET ships with two compression methods (Deflate and GZip) and more can be found in the SharpZipLib open source library

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top