Question

Reading All possible C# array initialization syntaxes I wondered why C# always infers an array of int/Int32 where a smaller data type such as byte or short would suffice.

new[] { 30, 130, 230 } // sbyte[] suffices but becomes int[]
new[] { -1, 125, -119 } // sbyte[] suffices  but becomes int[]
new[] { -31647, -1337, 23456} // short suffices but becomes int[]

In the referenced question, Eric Lippert states that the 'best type' is used - see below, but how is int the best possible type? If we are going for overkill, why not use long then?

The type of the array element is inferred by computing the best type, if there is one, of all the given elements that have types. All the elements must be implicitly convertible to that type.

I would suspect that processing 8 or 16-bit datatypes could be faster than 32-bit structures, e.g. when using SIMD where four byte instances could fit in the register space of one int/Int32. I know that SSE instructions are not (widely) used by the JIT Compiler, but this usage of 'int everywhere' ensures that it will not help much when the JIT Compiler is going to include such optimizations.

Could someone elaborate on these facts and tell why it always resorts to int?

// Edit // I don't really care about the specification that prescribes that a literal without a prefix should be considered an int. To rephrase the question:

Why are datatypes used that are larger than needed? Why does the specification have this rule for literals? What are the advantages since the huge downside is the away from future (SIMD) optimizations.

Was it helpful?

Solution

Why are datatypes used that are larger than needed?

The number of line-of-business applications where you're doing a calculation in integers and can guarantee that the result will fit into a byte or short are vanishingly small. The number of line-of-business applications where the result of an integer calculation fits into an int is enormous.

Why does the specification have this rule for literals?

Because it is a perfectly sensible rule. It is consistent, clear and understandable. It makes a good compromise between many language goals such as reasonable performance, interoperability with existing unmanaged code, familiarity to users of other languages, and treating numbers as numbers rather than as bit patterns. The vast majority of C# programs use numbers as numbers.

What are the advantages since the huge downside is the away from future (SIMD) optimizations.

I assure you that not one C# programmer in a thousand would list "difficulty of taking advantage of SIMD optimizations" as a "huge downside" of C#'s array type inference semantics. You may in fact be the only one. It certainly would not have occurred to me. If you're the kind of person who cares so much about it then make the type manifest in the array initializer.

C# was not designed to wring every last ounce of performance out of machines that might be invented in the future, and particularly was not designed to do so when type inference is involved. It was designed to increase productivity of line-of-business developers, and line-of-business developers don't think of columnWidths = new [] { 10, 20, 30 }; as being an array of bytes.

OTHER TIPS

C# 5.0 spec 2.4.4.2

• If the literal has no suffix, it has the first of these types in which its value can be represented: int, uint, long, ulong.

• If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.

• If the literal is suffixed by L or l, it has the first of these types in which its value can be represented: long, ulong.

• If the literal is suffixed by UL, Ul, uL, ul, LU, Lu, lU, or lu, it is of type ulong.

All of your examples hit the first in that list... int.

All integral literals follow this rule. Which is why var i = 10; is inferred as int too.

When you put integer value without any suffix like 30, 130, 230 you declare int32 value; so

new[] { 30, 130, 230 }; // <- array of int's

and if you want array of byts you have to put it explicitly:

  new byte[] { 30, 130, 230 }; // <- treat each value as byte

The literals you use as examples all have have System.Int32, while the values could be stored without loss in narrowed integral types (eg. System.Int16) the syntax says System.Int32.

As all the specified members of each array are System.Int32, the array has type System.Int32[].

Of course it would be possible to define a language where integral literals (without other indication such as suffixes) have type "the smallest integral type sufficient to hold the value" that language is not C#.

In the latest – V5.0 – C# Language specification (from my VS2013 installation), in section 2.4.4.2:

Integer literals are used to write values of types int, uint, long, and ulong.

Ie. there is no way to write a byte, sbyte, short, or unsigned short literal without a cast.

I believe that operations will always be faster running in the native bit-size, so int for 32-bit machines, hence the convention.

This also implies that for running 64-bit applications, int64 would be better used than int for arrays.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top