How can I know the ACTUAL maximum number of elements a .net array of a given type can be allocated?

https://stackoverflow.com/questions/1840481

12-09-2019
|

Question

I know that all arrays in .net are limited to 2 GB, under this premise, I try not to allocate more that n = ((2^31) - 1) / 8 doubles in an array. Nevertheless, that number of elements still doesn't seem to be valid. Anyone knows how can I determine at run time the maximum number of elements given sizeof(T)?

I know that whatever quantity approaching that number is just a lot of elements but, for all intents and purposes, let's say I need it.

Note: I'm in a 64-bit environment, with a target platform for my application of AnyCPU, and at least 3100 MB free in RAM.

Update: Thank you all for your contributions and sorry I was so quiet. I apologise for the inconvenience. I have not been able to rephrase my question but I can add that, what I am looking for is solving something like this:

template <class T>
array<T>^ allocateAnUsableArrayWithTheMostElementsPossible(){
    return gcnew array<T>( ... );
}

The results in my own answer are kinda satisfactory but not good enough. Furthermore, I haven't test it on another machine (Kind of hard finding another machine with more than 4 GB). Besides, I have been doing some research on my own and it seems there's no cheap way to calculate this at run time. Anyhow, that was just a plus, none of the user of what-I-am-trying-to-accomplish can expect to use the feature I am trying to implement without having the capacity.

So, in other words, I just want to understand why the maximum number of elements of an array don't add up to 2GB ceteris paribus. A top maximum is all I need for now.

Solution

Update: answer COMPLETELY rewritten. Original answer contained methods to find the largest possible addressable array on any system by divide and conquer, see history of this answer if you're interested. The new answer attempts to explain the 56 bytes gap.

In his own answer, AZ explained that the maximum array size is limited to less then the 2GB cap and with some trial and error (or another method?) finds the following (summary):

If the size of the type is 1, 2, 4 or 8 bytes, the maximum occupiable size is 2GB - 56 bytes;
If the size of the type is 16 bytes, the max is 2GB - 48 bytes;
If the size of the type is 32 bytes, the max is 2GB - 32 bytes.

I'm not entirely sure about the 16 bytes and 32 bytes situations. The total available size for the array might be different if it's an array of structs or a build-in type. I'll emphasize on 1-8 bytes type size (of which I'm not that sure either, see conclusion).

Data layout of an array

To understand why the CLR does not allow exactly 2GB / IntPtr.Size elements we need to know how an array is structured. A good starting point is this SO article, but unfortunately, some of the information seems false, or at least incomplete. This in-depth article on how the .NET CLR creates runtime objects proved invaluable, as well as this Arrays Undocumented article on CodeProject.

Taking all the information in these articles, it comes down to the following layout for an array in 32 bit systems:

Single dimension, built-in type
SSSSTTTTLLLL[...data...]0000
^ sync block
    ^ type handle
        ^ length array
                        ^ NULL

Each part is one system DWORD in size. On 64 bit windows, this looks as follows:

Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLL[...data...]00000000
^ sync block
        ^ type handle
                ^ length array
                                    ^ NULL

The layout looks slightly different when it's an array of objects (i.e., strings, class instances). As you can see, the type handle to the object in the array is added.

Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
^ sync block
        ^ type handle
                ^ length array
                        ^ type handle array element type
                                            ^ NULL

Looking further, we find that a built-in type, or actually, any struct type, gets its own specific type handler (all uint share the same, but an int has a different type handler for the array then a uint or byte). All arrays of object share the same type handler, but have an extra field that points to the type handler of the objects.

A note on struct types: padding may not always be applied, which may make it hard to predict the actual size of a struct.

Still not 56 bytes...

To count towards the 56 bytes of the AZ's answer, I have to make a few assumptions. I assume that:

the syncblock and type handle count towards the size of an object;
the variable holding the array reference (object pointer) counts towards the size of an object;
the array's null terminator counts towards the size of an object.

A syncblock is placed before the address the variable points at, which makes it look like it's not part of the object. But in fact, I believe it is and it counts towards the internal 2GB limit. Adding all these, we get, for 64 bit systems:

ObjectRef + 
Syncblock +
Typehandle +
Length +
Null pointer +
--------------
40  (5 * 8 bytes)

Not 56 yet. Perhaps someone can have a look with Memory View during debugging to check how the layout of an array looks like under 64 bits windows.

My guess is something along these lines (take your pick, mix and match):

2GB will never be possible, as that is one byte into the next segment. The largest block should be 2GB - sizeof(int). But this is silly, as mem indexes should start at zero, not one;
Any object larger then 85016 bytes will be put on the LOH (large object heap). This may include an extra pointer, or even a 16 byte struct holding LOH information. Perhaps this counts towards the limit;
Aligning: assuming the objectref does not count (it is in another mem segment anyway), the total gap is 32 bytes. It's very well possible that the system prefers 32 byte boundaries. Take a new look at the memory layout. If the starting point needs to be on a 32 byte boundary, and it needs room for the syncblock before it, the syncblock will end up in the end of the first 32 bytes block. Something like this:
```
XXXXXXXXXXXXXXXXXXXXXXXXSSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
```
where XXX.. stands for skipped bytes.
multi dimensional arrays: if you create your arrays dynamically with Array.CreateInstance with 1 or more dimensions, a single dim array will be created with two extra DWORDS containing the size and the lowerbound of the dimension (even if you have only one dimension, but only if the lowerbound is specified as non-zero). I find this highly unlikely, as you would probably have mentioned this if this were the case in your code. But it would bring the total to 56 bytes overhead ;).

Conclusion

From all I gathered during this little research, I think that the Overhead + Aligning - Objectref is the most likely and most fitting conclusion. However, a "real" CLR guru might be able to shed some extra light on this peculiar subject.

None of these conclusions explain why 16 or 32 byte datatypes have a 48 and 32 byte gap respectively.

Thanks for a challenging subject, learned something along my way. Perhaps some people can take the downvote off when they find this new answer more related to the question (which I originally misunderstood, and apologies for the clutter this may have caused).

OTHER TIPS

So, I ran a li'l program to find out some hard values and this is what I found:

Given a type T, f(sizeof(T)) = N + d
- Where f is the real maximum size of an array of Ts.
- N is the theoretical maximum size, that is: Int32::MaxValue / sizeof(T)
- And d, is the difference between N and f(x).

Results:

f(1) = N - 56
f(2) = N - 28
f(4) = N - 14
f(8) = N - 7
f(16) = N -3
f(32) = N - 1

I can see that everytime the size dups, the difference between the real size and the theoretical size folds but not in powers of 2. Any ideas why?

Edit: d is amount of type T elements. To find d in bytes, do sizeof(T) * d.

Your process space is limited to 2GB unless you're [compiled anycpu or x64] and running in an x64 process [on an x64 machine]. This is what you're probably actually running into. Calculating the headroom you have in the process is not an exact science by any means.

(Nitpickers corner: There is a /3GB switch and stacks of other edge cases which impact this. Also, the process needs to have virtual or physical space to be allocated into too. The point is that at the present time, most people will more often run into the OS per process limit than any .NET limit)

Update: my other answer contains the solution but I leave this in for the info about Mono, C#, the CLR links and the discussion thread

The maximum size of an array is limited by the size of an integer, not by the size of the objects it contains. But any object in .NET is limited to 2GB, period (thanks to Luke and see EDIT), which limits the total size of your array, which is the sum of the individual elements plus a bit of overhead.

The reason that it chokes your system is, well, the system's available memory. And the system of a win32 process only allows you to use 2GB of memory, of which your program and the CLR already use quite a bit even before you start your array. The rest you can use for your array:

int alot = 640000000;
byte[] xxx = new byte[1U << 31 - alot];

It depends on how your CLR is configured whether or not you run out of memory. For instance, under ASP.NET you are bound by default to 60% of the total available memory of the machine.

EDIT: This answer to a related post goes a bit deeper into the subject and the problems with 64 bit. It is possible on 64 bit systems, but only using workarounds. It points to this excellent blog post on the subject which explains BigArray<T>.

NOTE 1: other CLR's, i.e. Mono's, simply allow larger then 2GB objects.

NOTE 2: it is not the language that limits you. This compiles just fine in C#, but try and fine a machine that doesn't throw on it is a rather futuristic thought (and frankly, the field in the Array class holding the length is an int, which means this will always throw on 32 bit, but not necessarily, while extremely likely, on any 64 bit implementation):

int[] xxx = new int[0xFFFFFFFFFFFFFFFF];  // 2^64-1

You also need to add the pointer size (System.IntPtr.Size) to each sizeof(T) to account for the pointer to the object in any given array element.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow