Pergunta

In my system I frequently operate with airport codes ("YYZ", "LAX", "SFO", etc.), they are always in the exact same format (3 letter, represented as uppercase). The system typically deals with 25-50 of these (different) codes per API request, with over a thousand allocations total, they are passed around through many layers of our application, and are compared for equality quite often.

We started with just passing strings around, which worked fine for a bit but we quickly noticed lots of programming mistakes by passing in a wrong code somewhere the 3 digit code was expected. We also ran into issues where we were supposed to do a case-insensitive comparison and instead did not, resulting in bugs.

From this, I decided to stop passing strings around and create an Airport class, which has a single constructor that takes and validates the airport code.

public sealed class Airport
{
    public Airport(string code)
    {
        if (code == null)
        {
            throw new ArgumentNullException(nameof(code));
        }

        if (code.Length != 3 || !char.IsLetter(code[0]) 
        || !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
        {
            throw new ArgumentException(
                "Must be a 3 letter airport code.", 
                nameof(code));
        }

        Code = code.ToUpperInvariant();
    }

    public string Code { get; }

    public override string ToString()
    {
        return Code;
    }

    private bool Equals(Airport other)
    {
        return string.Equals(Code, other.Code);
    }

    public override bool Equals(object obj)
    {
        return obj is Airport airport && Equals(airport);
    }

    public override int GetHashCode()
    {
        return Code?.GetHashCode() ?? 0;
    }

    public static bool operator ==(Airport left, Airport right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Airport left, Airport right)
    {
        return !Equals(left, right);
    }
}

This made our code much easier to understand and we simplified our equality checks, dictionary / set usages. We now know that if our methods accept an Airport instance that it will behave the way we expect, it has simplified our method checks to a null reference check.

The thing I did notice, however, was the garbage collection was running much more often, which I tracked down to a lot of instances of Airport getting collected.

My solution to this was to convert the class into a struct. Mostly it was just a keyword change, with the exception of GetHashCode and ToString:

public override string ToString()
{
    return Code ?? string.Empty;
}

public override int GetHashCode()
{
    return Code?.GetHashCode() ?? 0;
}

To handle the case where default(Airport) is used.

My questions:

  1. Was creating an Airport class or struct a good solution in general, or am I solving the wrong problem / solving it the wrong way by creating the type? If it's not a good solution, what is a better solution?

  2. How should my application handle instances where the default(Airport) is used? A type of default(Airport) is nonsensical to my application, so I've been doing if (airport == default(Airport) { throw ... } in places where getting an instance of Airport (and its Code property) is critical to the operation.

Notes: I reviewed the questions C#/VB struct – how to avoid case with zero default values, which is considered invalid for given structure?, and Use struct or not before asking my question, however I think my questions are different enough to warrant its own post.

Foi útil?

Solução

Update: I rewrote my answer to address some incorrect assumptions about C# structs, as well as the OP informing us in comments that interned strings are being used.


If you can control the data coming in to your system, use a class as you posted in your question. If someone runs default(Airport) they will get a null value back. Be sure to write your private Equals method to return false whenever comparing null Airport objects, and then let the NullReferenceException's fly elsewhere in the code.

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case a struct is ideal for the simple fact default(Airport) will give you something other than a null pointer. Make up an obvious value to represent "no value" or the "default value" so you have something to print on screen or in a log file (like "---" for instance). In fact, I would just keep the code private and not expose a Code property at all — just focus on behavior here.

public struct Airport
{
    private string code;

    public Airport(string code)
    {
        // Check `code` for validity, throw exceptions if not valid

        this.code = code;
    }

    public override string ToString()
    {
        return code ?? (code = "---");
    }

    // int GetHashcode()

    // bool Equals(...)

    // bool operator ==(...)

    // bool operator !=(...)

    private bool Equals(Airport other)
    {
        if (other == null)
            // Even if this method is private, guard against null pointers
            return false;

        if (ToString() == "---" || other.ToString() == "---")
            // "Default" values should never match anything, even themselves
            return false;

        // Do a case insensitive comparison to enforce logic that airport
        // codes are not case sensitive
        return string.Equals(
            ToString(),
            other.ToString(),
            StringComparison.InvariantCultureIgnoreCase);
    }
}

Worse case scenario converting default(Airport) to a string prints out "---" and returns false when compared to other valid Airport codes. Any "default" airport code matches nothing, including other default airport codes.

Yes, structs are meant to be values allocated on the stack, and any pointers to heap memory basically negate the performance advantages of structs, but in this case the default value of a struct has meaning and provides some additional bullet resistance to the rest of the application.

I would bend the rules a little here, because of that.


Original Answer (with some factual errors)

If you can control the data coming in to your system, I would do as Robert Harvey suggested in the comments: Create a parameterless constructor and throw an exception when it is called. This prevents invalid data from entering the system via default(Airport).

public Airport()
{
    throw new InvalidOperationException("...");
}

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case you can create an airport code that is invalid, but make it seem like an obvious error. This would involve creating a parameterless constructor and setting the Code to something like "---":

public Airport()
{
    Code = "---";
}

Since you are using a string as the Code, there is no point in using a struct. The struct gets allocated on the stack, only to have the Code allocated as a pointer to a string in heap memory, so no difference here between class and struct.

If you changed the airport code to a 3 item array of char's then a struct would be fully allocated on the stack. Even then the volume of data isn't that big to make a difference.

Outras dicas

Use the Flyweight pattern

Since Airport is, correctly, immutable, there is no need to create more than one instance of any particular one, say, SFO. Use a Hashtable or similar (note, I'm a Java guy, not C# so exact details might vary), to cache Airports when they get created. Before creating a new one, check in the Hashtable. You are never freeing up Airports, so GC has no need to free them.

One additional minor advantage (at least in Java, not sure about C#) is that you don't need to write an equals() method, a simple == will do. Same for hashcode().

I’m not a particularly advanced programmer, but wouldn’t this be a perfect use for an Enum?

There are different ways to construct enum classes from lists or strings. Here’s one I’ve seen in the past, not sure if it’s the best way though.

https://blog.kloud.com.au/2016/06/17/converting-webconfig-values-into-enum-or-list/

One of the reasons you're seeing more GC activity is because you're creating a second string now -- the .ToUpperInvariant() version of the original string. The original string is eligible for GC right after the constructor runs and the second one is eligible at the same time as the Airport object is. You might be able to minimize it in a different fashion (note the third parameter to string.Equals()):

public sealed class Airport : IEquatable<Airport>
{
    public Airport(string code)
    {
        if (code == null)
        {
            throw new ArgumentNullException(nameof(code));
        }

        if (code.Length != 3 || !char.IsLetter(code[0])
                             || !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
        {
            throw new ArgumentException(
                "Must be a 3 letter airport code.",
                nameof(code));
        }

        Code = code;
    }

    public string Code { get; }

    public override string ToString()
    {
        return Code; // TODO: Upper-case it here if you really need to for display.
    }

    public bool Equals(Airport other)
    {
        return string.Equals(Code, other?.Code, StringComparison.InvariantCultureIgnoreCase);
    }

    public override bool Equals(object obj)
    {
        return obj is Airport airport && Equals(airport);
    }

    public override int GetHashCode()
    {
        return Code.GetHashCode();
    }

    public static bool operator ==(Airport left, Airport right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Airport left, Airport right)
    {
        return !Equals(left, right);
    }
}
Licenciado em: CC-BY-SA com atribuição
scroll top