Does C# Collection Initialization Syntax Avoid Default Initialization Overhead
-
06-07-2019 - |
Question
When you use the new C# collection initialization syntax:
string[] sarray = new[] { "A", "B", "C", "D" };
does the compiler avoid initializing each array slot to the default value, or is it equivalent to:
string[] sarray = new string[4]; // all slots initialized to null
sarray[0] = "A";
sarray[1] = "B";
sarray[2] = "C";
sarray[3] = "D";
Solution
The compiler still uses the newarr
IL instruction, so the CLR will still initialize the array.
Collection initialization is just compiler magic - the CLR doesn't know anything about it, so it'll still assume it has to do sanity clearance.
However, this should be really, really quick - it's just wiping memory. I doubt it's a significant overhead in many situations.
OTHER TIPS
Quick test:
string[] arr1 =
{
"A","B","C","D"
};
arr1.GetHashCode();
string[] arr2 = new string[4];
arr2[0] = "A";
arr2[1] = "B";
arr2[2] = "C";
arr2[3] = "D";
arr2.GetHashCode();
results in this IL (note, they're both identical)
IL_0002: newarr [mscorlib]System.String
IL_0007: stloc.2
IL_0008: ldloc.2
IL_0009: ldc.i4.0
IL_000a: ldstr "A"
IL_000f: stelem.ref
IL_0010: ldloc.2
IL_0011: ldc.i4.1
IL_0012: ldstr "B"
IL_0017: stelem.ref
IL_0018: ldloc.2
IL_0019: ldc.i4.2
IL_001a: ldstr "C"
IL_001f: stelem.ref
IL_0020: ldloc.2
IL_0021: ldc.i4.3
IL_0022: ldstr "D"
IL_0027: stelem.ref
IL_0028: ldloc.2
IL_0029: stloc.0
IL_002a: ldloc.0
IL_002b: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
IL_0030: pop
IL_0031: ldc.i4.4
IL_0032: newarr [mscorlib]System.String
IL_0037: stloc.1
IL_0038: ldloc.1
IL_0039: ldc.i4.0
IL_003a: ldstr "A"
IL_003f: stelem.ref
IL_0040: ldloc.1
IL_0041: ldc.i4.1
IL_0042: ldstr "B"
IL_0047: stelem.ref
IL_0048: ldloc.1
IL_0049: ldc.i4.2
IL_004a: ldstr "C"
IL_004f: stelem.ref
IL_0050: ldloc.1
IL_0051: ldc.i4.3
IL_0052: ldstr "D"
IL_0057: stelem.ref
IL_0058: ldloc.1
IL_0059: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
I ran a short test on instantianting an array using the syntax you describe and found that instantiating with non-default values took about 2.2 fold longer than instantiantion with default values.
When I switched and instantiated with default values, it takes about the same amount of time.
Indeed, when I looked at the decompile it appears that what happens is the array is initialized, and then is populated with any values that are not the default.
Instantiating with non default values:
bool[] abPrimes = new[] {
true, true
};
0000007e mov edx,2
00000083 mov ecx,79114A46h
00000088 call FD3006F0
0000008d mov dword ptr [ebp-64h],eax
00000090 mov eax,dword ptr [ebp-64h]
00000093 mov dword ptr [ebp-54h],eax
00000096 mov eax,dword ptr [ebp-54h]
00000099 cmp dword ptr [eax+4],0
0000009d ja 000000A4
0000009f call 76A9A8DC
000000a4 mov byte ptr [eax+8],1
000000a8 mov eax,dword ptr [ebp-54h]
000000ab cmp dword ptr [eax+4],1
000000af ja 000000B6
000000b1 call 76A9A8DC
000000b6 mov byte ptr [eax+9],1
000000ba mov eax,dword ptr [ebp-54h]
000000bd mov dword ptr [ebp-40h],eax
Instantiating with default values:
bool[] abPrimes2 = new[] {
false, false
};
000000c0 mov edx,2
000000c5 mov ecx,79114A46h
000000ca call FD3006F0
000000cf mov dword ptr [ebp-68h],eax
000000d2 mov eax,dword ptr [ebp-68h]
000000d5 mov dword ptr [ebp-54h],eax
000000d8 mov eax,dword ptr [ebp-54h]
000000db mov dword ptr [ebp-5Ch],eax
It is not possible to avoid initializing each array slot to the default value, at least in IL level.
String is a CLASS, not a struct.
That means A, B, C, D and the sarray could be stored in any position. A, B, C and D might be get from the Intern pool, that the reference to the object could be dynamic.
But I believe that the JIT could smart enough to reduce half of these overhead.
PS. Premature optimization being the root of all evil.