Does the standard guarantee that the total memory occupied by a std::vector scales as C+N*sizeof(T)?

Question 1

For there to be any such guarantee, the standard would have to pass the requirement on to the interface of the allocator. It doesn't, so there isn't.

In practice though, as a quality of implementation issue, you expect that memory allocators probably have a constant overhead per allocation but no overhead proportional to the size of the allocation. A counter-example to this would be a memory allocator that always uses a power-of-two-sized block regardless of the size requested. This would be pretty wasteful for large allocations, but not forbidden either as a user-defined allocator or even as the system allocator used by ::operator new[]. It would create an overhead proportional to N on average, assuming that the vector capacities don't happen to fit nicely.

Leaving aside the allocator, I don't believe there's anything in the standard to say that the vector can't allocate (for example) an extra byte per element and use it to store some flags for who-knows-what purpose. As others have remarked, the contiguousness requirement means that those extra bytes cannot lie between the vector elements. They would have to be in a separate allocation or all together at one end of the allocation.

There's at least one good reason that the standard doesn't forbid implementations from "wasting" space by using it to store data used for operations not required by the standard -- doing so would rule out many debugging techniques!

Question 2

Do I have the guarantee that there is no overhead per element?

Does the standard prohibit it? No.
But would you ever expect to see this in practice? No.

The rule of contiguous data storage and the complexity requirements of vector growth mean that the only possible way for a non-constant-sized data block to be part of the vector would be if it were emplaced directly before the dynamically-allocated element data, or somewhere else entirely. There is no guarantee that this doesn't happen, but, quite simply, no implementation does it because it would be entirely ridiculous and serve no purpose whatsoever.

Does it states that the total occupied memory is:
S = C+N*sizeof(T)

There may be other data members of the vector itself (what you've inaccurately deemed to be "on the stack"), increasing the object's size in constant terms.

Question 3

The standard gives no guarantee, afaics. But the requirement that the elements be stored contiguously makes it likely that there is no per element overhead. The whole data must be in a memory area which was allocated in one piece. @aschepler remarked correctly though that typical free store implementations have a (constant) overhead per allocation unit, typically a size variable or an end pointer.

Additionally there may be some padding overhead, e.g. an allocation unit will probably span multiples of the natural word size on a machine. And then the OS call will likely reserve a whole memory page to the program, even if you allocate only 1 byte. Whether you consider that as overhead or not is a matter of taste (from the outside yes, from the inside of the program no; and of course subsequent vectors or resize()s dine from the same page).

So at least it's CM + CV + N*sizeof(T), CM and CV being the overhead in the vector (not necessarily on the stack, as Lighness said) and CM the overhead of the memory management.

Question 4

No, the implementation characteristics you suggest would not be standard compliant. The STL specifies that a std::vector support appending individual elements in amortized constant time.

In order for the amortized cost of inserting an element to be O(1), the size of the array must increase in at least a geometric progression when it is reallocated (see here). A geometric progression means that if the size of the array was N, the new size after reallocation must be K * N, for some K > 1. The choice of K is implementation dependent.

To find out how much space a std::vector has allocated, call std::vector::capacity(). With regard to overhead per element, in the best case the capacity() == size(). In the worst case capacity() == K * (size() - 1).

If you must ensure that your vector is absolutely no larger than it has to be, you can call std::vector::reserve() if you know exactly how large your std::vector will be. You may also call std::vector::resize() (or std::vector::shrink_to_fit() in C++11) after you are done adding elements to reduce the amount of memory reserved.