How much of an object (containing std::vectors) is loaded in to the L1/L2/L3 cache?

Question 1

Depends on the way that the memory is organized on your system. If it just so happens that the backing arrays for a and b are located very closely in memory (since the CPU will usually issue larger reads to fill cache in the hopes you use it) it's possible that they will be loaded. If not I see no reason for reading b would imply anything to do with a aside from attempting to read some pointers from where the class actually resides in memory.

What it does show is that using classes in haphazard manners can and will cause cache misses just because of the way that they reside in memory.

General rule for what gets loaded into the cache is, if the CPU issues a read and misses the cache it will load a cache aligned chunk from main memory (in the examples there 128 bytes).

For your edited example, Yes these are colocated pieces of memory and parts of a MAY be loaded if reads to b are issued just because of their location in memory.

For your example, each MyClass object consists of a contiguous region of 2000 * sizeof(double) bytes (most likely aligned). These objects are packed into a continuous memory region pointed to by the vector. Accessing the b member of each object will incur a cache miss (if it's not cached). The contents of a cache aligned piece of memory will be loaded from the each read that misses the cache. Depending on the memory alignment constraints and cache size it's possible that some of the entries from the a member will be loaded into memory. It's even possible to assume that due to padding and alignment that it will not be the case that any of your MyClass a members will be loaded into the cache (and there's no reason they should be as they were not accessed).

Question 2

Your declaration:

for(auto x : y) ...

declares x as a value instead of a reference. It's possible that the compiler could optimize away copying each element of y into the local variable x, but I wouldn't count on it.

If you write:

for(auto &x : y) ...

Then the loop will work on references to the objects in y. I'm assuming that's what you meant to do.

In concrete terms, ignoring struct padding: the compiler will convert

double temp = y[i].b[j];

into something equivalent to

double temp = *(
    y.data() + i * sizeof(MyClass) // start of y[i]
    + 1000 * sizeof(double)        // skip over y[i].a
    + j * sizeof(double));         // get to the right place in y[i].b

and it will load the cache-line-sized block containing that address into a cache line.

Then, as you iterate over more elements of y[i].b, many of them will already be in the cache.

Since the arrays contain 1000 elements each, they are much bigger than cache lines on a typical CPU. 1000 doubles take up 8000 bytes, whereas cache lines on the Sandy Bridge architecture (for example) are 64 bytes. Iterating over the arrays will saturate the cache effectively. You might waste part of a cache line on the first and final elements of x.a, but the effect should be small. As the size of your arrays increases, the significance of these wasted loads approaches 0.

The Playstation article talks about objects that are comparable in size to a cache line. Those optimizations won't matter as much for big objects like yours.

Question 3

In the link you are referring to, the two arrays a and b are 4x4 matrices, which means 16 elements each. As this is about a video game, they are likely floating-point. 16 floats take 64 bytes. A CPU cache line is 128 bytes. So there is a significant chance that a large part of a is in the same cache line than b[0]. Statistically, 50% of a will be in the same cache line than b[0]. Reading b[0] will then load that part of a in the cache. If you manage to align the class/struct on 128 bytes, you will be even guaranteed that a and b fit entirely in the same cache line.

Now in your example you do not use 16 floats but 1000 doubles. That is 8000 bytes, much larger than a typical cache line. A few last elements of a may be in the same cache line than b[0] but the effect will be small.