Question

Note: This is a hypothetical discussion. I don't actually want to implement a struct String.

The .Net String class could be a value type (a struct), because it is immutable and has few members. But String isn't a value type. Probably because String was designed before nullable types were introduced, or possibly to match the behavior of Java strings.

Would it be beneficial to change String to a value type or implement a value-type variant of String? It would remove a level of indirection and match the common non-nullable case.

Was it helpful?

Solution

Short Answer

A string has to have a reference type member (e.g., a char[]) in order to be of variable size. Thus any struct String type would really just be a reference type disguised as a value type anyway.


Medium Answer

I discussed this in more depth here. But the basic gist of my idea was: yes, you could have a string "value type," presumably something like this:

public struct String
{
    char[] m_characters;

    public String(IEnumerable<char> characters)
    {
        m_characters = characters.ToArray();
    }

    public char this[int index]
    {
        get { return m_characters[index]; }
    }

    // All those other string functions... IndexOf, Substring, etc.
}

...but there's really no point. The above is essentially just a reference type (a wrapper around a char[]) nestled inside a shell that looks deceptively like a value type. Moreover, when you design a type this way you are getting the drawbacks of using a value type (e.g., potential for boxing) with none of the benefit (an instance of the above String type has the same memory allocation requirements as the reference type it wraps, so it buys you nothing from a GC standpoint either).

OTHER TIPS

No. Value types in .Net must have a size known at compile time. The size of a string is often determined only at runtime and hence cannot be model'd as a value type.

Additionally a type in .Net which is a Value type can only have 1 size. Or more simply there cannot be different instances of the same value type with different sizes. This means that you'd need to represent strings of different lengths as different types. For example "dog" and "zebra" would be different incompatible types

Note

It seems like this question can be interpretted in 2 ways

  1. Make string a value type with no alternate storage
  2. Make string a value type and allow for alternate storage in an array

My answer is for scenario #1. It doesn't seem like scenario #2 holds a lot of value because it just replaces a reference type with a value type that has an embedded reference type.

This would indeed be a valid implementation.

Very naively, it could look like this:

struct String {
    readonly char[] _buffer;
    // Methods etc. …
}

There is one peculiarity when compared to the string class (apart from the fact that it cannot be null): a zero-sized string is not null-terminated! As far as I remember, .NET strings are null-terminated to facilitate interaction with legacy C APIs (WinAPI).

There is one point where a string class has an advantage: interning can be implemented easier: String.Intern is a sort of builder function that, given the same string value, always returns the same string instance. That way, a comparison of two interned strings a and b can be sped up considerably: it’s now sufficient to test their addresses.

But of course, a similar kind of string interning could be implemented for string structs, by comparing whether their character buffer shares the same address.

No. Structs of any given type always have the same length. Different instances of a string do not.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top