Question

I have a general organization problem with my code. I'm modeling DNA, and I've created a Nucleobase struct to store the "letter" of the DNA. For ease of use, I would prefer the following functionality for fluent creation of objects, equality checking, and testing purposes:

Nucleobase newInstance = Nucleobase.A;

and

if (someNucleobase == Nucleobase.C) {
    //do something...
}

Basically I want them to behave like an enum, but with functionality and behavior of a struct with methods, other fields etc.

Nucleobases can only be one of 4 letters. So I've implemented the following:

public struct Nucleobase : IEquatable<Nucleobase> {

    readonly Letter letter;

    Nucleobase(Letter letter) {
        this.letter = letter;
    }

    enum Letter {
        A,
        T,
        C,
        G
    }

    public static Nucleobase A => new Nucleobase(Letter.A);
    public static Nucleobase T => new Nucleobase(Letter.T);
    public static Nucleobase C => new Nucleobase(Letter.C);
    public static Nucleobase G => new Nucleobase(Letter.G);

    //More things... e.g. converting to and from string, IEquatable implementation etc
}

This works great. But seems to have repetition, and makes it difficult to add new values since I'd have to change code in two places, the enum, and the list of static Nucleobase properties.

Now, this is fine for Nucleobases since there will only ever be 4 values. However, Now I want to model Codons in a similar fashion. Codons are a collection of 3 Nucleobases. But codons have many more possible values (e.g. AAT, ATA, TAA, GCT, GTC etc.).

Is there a way to better construct the static members so I can do something like: Codon newCodon = Codon.AAT; without having to type out all the possible values for both the enum and the static fields? I'm not married to it being a struct. If there's a better way to model this I'm open to shifting it.

Ideally it would be automatic by looking at a list or enum of all possible values of Codons, and automatically create the fields somehow. I also would like a similar model for Amino acids, etc.

I suspect that there's a more elegant way than using static properties, but I have been pondering it for quite some time with no solutions.

This seems like a pattern that is repeating a lot, when I have a struct with a limited set of valid values.

Note: My code is in C#, but the question is relatable to any OO language imo.

Was it helpful?

Solution

Since there will only ever be 4 Nucleobases, there will only ever be 64 possible Codons, right? If you are convinced you really need literals in code like Codon.AAT, I would just write a simple code generating script in whatever scripting language you prefer and generate the C# code for the enum and static values once, this should be pretty trivial. And since the list is fixed, you will not have a real issue with "later extensions" which might be introduced in one place and forgotten to be added in the second - this will simply not happen.

However, I would also consider whether a struct or class for codons which is initialized by a 3 character string parameter (like new Codon("AAT")) may not be the better alternative. The constructor can validate if the string is one of the 64 allowed values and throw an exception if not. Yes, you lose compile time checking, but will you really have a lot of places where such a literal is required, or will you have more places where a string variable s will be filled from another data source and the code will look like new Codon(s)? If it is the latter, you may not need the enum and the static variables.

(Of course, both suggestions are orthogonal, you can have a string constructor and the 64 Codon literals.)

OTHER TIPS

As already answered, code-generation is a good technique and will also give you insight in something new for you.

Alternatively, if you wouldn't mind a slight bit of a(n initially) weird-looking code excerpt, you could do something along the following lines. Granted, it may not impart the most intuitive meaning, but you end up with the closest possible alternative I could come up with:

Codon c = CodonHelper.A.G.C;

The trick is to abuse names and notations slightly and allow some seemingly domain-irrelevant elements, such as Letter and Bigram inside your code. This way, though, you get some type-safety, i.e. you can not do:

Codon c = CodonHelper.A.G;

because you will get a compile-time error.

I have not spent too much time considering any potential drawbacks of this setup, because I was primarily concerned with syntactic elegance, so I suggest you take some time to consider potentially fine-tuning this messy proof-of-concept if you end up adopting this style. Maybe you could merge some classes so you don't need internal for example, or rename Literal to Nucleobase, etc.

So, here it goes:

public enum Literal
{
    A,
    T,
    G,
    C
}

public class Letter
{
    private readonly Literal letter;

    internal Letter(Literal value)
    {
        letter = value;
    }

    public Bigram A => new Bigram(letter, Literal.A);

    public Bigram T => new Bigram(letter, Literal.T);

    public Bigram G => new Bigram(letter, Literal.G);

    public Bigram C => new Bigram(letter, Literal.C);

}

public class Bigram
{
    private readonly Literal firstLiteral;
    private readonly Literal secondLiteral;

    internal Bigram(Literal first, Literal second)
    {
        firstLiteral = first;
        secondLiteral = second;
    }

    public Codon A => new Codon(firstLiteral, secondLiteral, Literal.A);

    public Codon T => new Codon(firstLiteral, secondLiteral, Literal.T);

    public Codon G => new Codon(firstLiteral, secondLiteral, Literal.G);

    public Codon C => new Codon(firstLiteral, secondLiteral, Literal.C);
}

public class Codon
{
    internal Literal FirstLiteral { get; }
    internal Literal SecondLiteral { get; }
    internal Literal ThirdLiteral { get; }

    internal Codon(Literal first, Literal second, Literal third)
    {
        FirstLiteral = first;
        SecondLiteral = second;
        ThirdLiteral = third;
    }
}

//Now you only need something to start with and you are ready to go!
public static class CodonHelper
{
    public static Letter A => new Letter(Literal.A);

    public static Letter T => new Letter(Literal.T);

    public static Letter G => new Letter(Literal.G);

    public static Letter C => new Letter(Literal.C);
}

//Then, you can build your amino-acid enumeration (or class, if you want more functionality).
public enum AminoAcid
{
    //Because amino acids are NOT combinations of other things (at least not in the trivial sense),
    //you will have to explicitly encode the values and their names.
    Alanine,
    Arginine,
    //...
}

//And using a helper... (or you can refactor this in any other way you see fit).
public static class AminoHelper
{
    static readonly Dictionary<Codon, AminoAcid> _codonToAminoAcidMap;

    //Prepare the mapping in code...
    static AminoHelper()
    {
        _codonToAminoAcidMap = new Dictionary<Codon, AminoAcid>();

        _codonToAminoAcidMap.Add(CodonHelper.G.C.T, AminoAcid.Alanine);
        _codonToAminoAcidMap.Add(CodonHelper.G.C.C, AminoAcid.Alanine);
        _codonToAminoAcidMap.Add(CodonHelper.G.C.A, AminoAcid.Alanine);
        _codonToAminoAcidMap.Add(CodonHelper.G.C.G, AminoAcid.Alanine);

        //...
    }

    public static IReadOnlyDictionary<Codon, AminoAcid> AminoAcidMap => _codonToAminoAcidMap;

    public static AminoAcid ToAminoAcid(this Codon codon)
    {
        return AminoAcidMap[codon];
    }
}

I like Vector Zita answer, but I think it could have less repetition using templates

public enum Literal
{
    A,
    T,
    G,
    C
}
public abstract class CodonPart<startType,returnType>
{
    protected startType start;

    abstract protected returnType make(startType start, Literal literal);

    public returnType A => make(start, Literal.A);
    public returnType T => make(start, Literal.T);
    public returnType G => make(start, Literal.G);
    public returnType C => make(start, Literal.C);
}

public class CodonHelper : CodonPart<Letter, Letter>
{
    public CodonHelper()
    {
        start = null;
    }
    protected override Letter make(Letter start, Literal literal)
    {
        return new Letter(literal);
    }
}

public class Letter : CodonPart<Literal, Bigram>
{
    public Letter(Literal l)
    {
        start = l;
    }

    protected override Bigram make(Literal start, Literal literal)
    {
        return new Bigram(start, literal);
    }
}

public class Bigram : CodonPart<Letter, Codon>
{
    private readonly Literal firstLiteral;
    private readonly Literal secondLiteral;

    internal Bigram(Literal first, Literal second)
    {
        firstLiteral = first;
        secondLiteral = second;
    }

    protected override Codon make(Letter start, Literal literal)
    {
        return new Codon(firstLiteral,secondLiteral,literal);
    }
}

public class Codon
{
    internal Literal FirstLiteral { get; }
    internal Literal SecondLiteral { get; }
    internal Literal ThirdLiteral { get; }

    internal Codon(Literal first, Literal second, Literal third)
    {
        FirstLiteral = first;
        SecondLiteral = second;
        ThirdLiteral = third;
    }
}

then usage is similar

Codon c=(new CodonHelper).A.T.G;
//or
CodonHelper helper=new CodonHelper();
Codon cc=helper.A.T.G;

or we can make CodonHelper a singleton.

Licensed under: CC-BY-SA with attribution
scroll top