Can a size_type ever be larger than std::size_t?

https://stackoverflow.com/questions/12498709

02-07-2021
|

Question

Standard containers with an std::allocator have their size_type defined as std::size_t. However, is it possible to have an allocator that allocates objects whose size cannot be represented by a size_t? In other words, can a size_type ever be larger than size_t?

Solution

Yes, and this could be useful in some cases.

Suppose you have a program that wishes to access more storage than will fit in virtual memory. By creating an allocator that references memory mapped storage and mapping it as required when indirecting pointer objects, you can access arbitrarily large amounts of memory.

This remains conformant to 18.2:6 because size_t is defined as large enough to contain the size of any object, but 17.6.3.5:2 table 28 defines size_type as containing the size of the largest object in the allocation model, which need not be an actual object in the C++ memory model.

Note that the requirements in 17.6.3.5:2 table 28 do not constitute a requirement that the allocation of multiple objects should result in an array; for allocate(n) the requirement is:

Memory is allocated for n objects of type T

and for deallocate the assertion is:

All n T objects in the area pointed to by p shall be destroyed prior to this call.

Note area, not array. Another point is 17.6.3.5:4:

The X::pointer, X::const_pointer, X::void_pointer, and X::const_void_pointer types shall satisfy the requirements of NullablePointer (17.6.3.3). No constructor, comparison operator, copy operation, move operation, or swap operation on these types shall exit via an exception. X::pointer and X::const_pointer shall also satisfy the requirements for a random access iterator (24.2).

There is no requirement here that (&*p) + n should be the same as p + n.

It's perfectly legitimate for a model expressible within another model to contain objects not representable in the outer model; for example, non-standard models in mathematical logic.

OTHER TIPS

size_t is the type of the unsigned integer you get by applying sizeof.

sizeof should return the size of the type (or of the type of the expression) that is his argument. In case of arrays it should return the size of the whole array.

This implies that:

there cannot be ANY structure or union that is larger than what size_t can represent.
there cannot be any array that is larger than what size_t can represent.

In other words, if something fits in the largest block of consecutive memory that you can access, then its size must fit in size_t (in non-portable, but easy to grasp intuitively terms this means that on most systems size_t is as large as void* and can 'measure' the whole of your virtual address space).

Edit: this next sentence is probably wrong. See below

Therefore the answer to is it possible to have an allocator that allocates objects whose size cannot be represented by a size_t? is no.

Edit (addendum):

I've been thinking about it and the above my be in fact wrong. I've checked the standard and it seems to be possible to design a completely custom allocator with completely custom pointer types, including using different types for pointer, const pointer, void pointer and const void pointer. Therefore an allocator can in fact have a size_type that is larger than size_t.

But to do so you need to actually define completely custom pointer types and the corresponding allocator and allocator traits instances.

The reason I say may is that I'm still a bit unclear if the size_type needs to span the size of the single object or also the size of multiple objects (that is an array) in the allocator model. I will need to investigate this detail (but not now, it's dinner time here :) )

Edit2 (new addendum):

@larsmans I think you may want to decide what to accept anyway. The problem seems to be a little more complicated than one may intuitively realize. I'm editing the answer again as my thoughts are definitively more than a comment (both in content and in size).

ReEdit (as pointed out in the comments the next two paragraphs are not correct):

First of all size_type is just a name. You can of course define a container and add a size_type to it with whatever meaning you wish. Your size_type could be a float, a string whatever.

That said in standard library containers size_type is defined in the container only to make it easy to access. It's in fact supposed to be identical to the size_type of the allocator for that container (and the size_type of the allocator should be the size_type of the allotator_traits of that allocator).

Therefore we shall henceforth assume that the size_type of the container, even one you define, follows the same logic 'by convention'. @BenVoight begins his answer with "As @AnalogFile explains, no allocated memory can be larger than size_t. So a container which inherits its size_type from an allocator cannot have size_type larger than size_t.". In fact we are now stipulating that if a container has a size_type then that comes from the allocator (he says inherit, but that of course is not in the common sense of class inheritance).

However he may or may not be 100% right that a size_type (even if it comes from an allocator) is necessarily constrained to size_t. The question really is: can an allocator (and the corresponding traits) define a size_type that is larger than size_t?

Both @BenVoight and @ecatmur suggest a usecase where the backing store is a file. However if the backing store is a file only for the content and you have something in memory that refers to that content (let's call that an 'handle'), then you are in fact doing a container that contains handles. A handle will be an instance of some class that stores the actual data on a file and only keeps in memory whatever it needs to retrieve that data, but this is irrelevant to the container: the container will store the handles and those are in memory and we still are in the 'normal' address space, so my initial response is still valid.

There is another case, however. You are not allocating handles, you are actually storing stuff in the file (or database) and your allocator (and relative traits) define pointer, const pointer, void pointer, const void pointer etc. types that directly manage that backing store. In this case, of course, they also need to define the size_type (replacing size_t) and difference_type (replacing ptrdiff_t) to match.

The direct difficulties in defining size_type (and difference_type) as larger than size_t when size_t is already as large as the largest implementation provided primitive integral type (if not, then there are no difficulties) are related to the fact that they need to be integer types.

Depending on how you interpret the standard this may be impossible (because according to the standard integer types are the types defined in the standard plus the extended integer types provided by the implementation) or possible (if you interpret it such that you can provide an extended integer type yourself) as long as you can write a class that behaves exactly like an primitive type. This was impossible in the old times (overloading rules did make primitive types always distinguishable from user defined types), but I'm not 100% up-to-date with C++11 and this may (or may not be changed).

However there are also indirect difficulties. You not only need to provide a suitable integer type for size_type. You also need to provide the rest of the allocator interface.

I've been thinking about it a little and one problem I see is in implementing *p according to 17.6.3.5. In that *p syntax p is a pointer as typed by the allocator traits. Of course we can write a class and define an operator* (the nullary method version, doing pointer dereferece). And one may think that this can be easily done by 'paging in' the relative part of the file (as @ecatmur suggests). However there's a problem: *p must be a T& for that object. Therefore the object itself must fit in memory and, more importantly, since you may do T &ref = *p and hold that reference indefinitely, once you have paged in the data you will never be allowed to page it out any more. This means that effectively there may be no way to properly implement such an allocator unless the whole backing store can also be loaded into memory.

Those are my early observations and seem to actually confirm my first impression that the real answer is no: there is no practical way to do it.

However, as you see, things are much more complicated than mere intuition seems to suggest. It may take quite a time to find a definitive answer (and I may or may not go ahead and research the topic further).

For the moment I'll just say: it seems not to be possible. Statements to the contrary shall only be acceptable if they are not based solely on intuition: post code and let people debate if your code fully conforms to 17.6.3.5 and if your size_type (which shall be larger than size_t even if size_t is as large as the largest primitive integer type) can be considered an integer type.

Yes and no.

As @AnalogFile explains, no allocated memory can be larger than size_t. So a container which inherits its size_type from an allocator cannot have size_type larger than size_t.

However, you can design a container type which represents a collection not entirely stored in addressable memory. For example, the members could be on disk or in a database. They could even be computed dynamically, e.g. a Fibonacci sequence, and never stored anywhere at all. In such cases, size_type could easily be larger than size_t.

I'm sure its buried in the standard somewhere, but the best description i've seen for size_type is from the SGI-STL documentation. As I said, i'm sure it is in the standard, and if someone can point it out, by all means do.

According to SGI, a container's size_type is:

An unsigned integral type that can represent any nonnegative value of the container's distance type

It makes no claims that is must be anything besides that. In theory you could define a container that uses uint64_t, unsigned char, and anything else in between. That it is referencing the container's distance_type is the part I find interesting, since...

distance_type: A signed integral type used to represent the distance between two of the container's iterators. This type must be the same as the iterator's distance type.

This doesn't really answer the question, though, but it is interesting to see how size_type and size_t differ (or can). Regarding your question, see (and up vote) @AnalogFile s answer, as I believe it to be correct.

From §18.2/6

The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.

So, if it were possible for you to allocate an object whose size cannot be represented by a size_t it would make the implementation non-conforming.

To add to the "standard" answers, also note the stxxl project which is supposed to be able to handle terabytes of data using disk storage (perhaps by extension, network storage). See the header of vector for example, for the definition of size_type (line 731, and line 742) as uint64.

This is a concrete example of using containers with larger sizes than memory can afford, or that even the system's integer can handle.

Not necessarily.

I assume by size_type you mean the typedef inside most STL containers?

If so, then just because size_type was added to all the containers instead of just using size_t means that the STL is reserving the right to make size_type any type they like. (By default, in all implementations I'm aware of size_type is a typedef of size_t).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow