What is the largest amount of data that a single x86 instruction will read-from or write-to the L1 cache?

StackOverflow https://stackoverflow.com/questions/22231625

  •  10-06-2023
  •  | 
  •  

Question

I just read up on AVX (Wikipedia), and it brought this question to my mind.

Était-ce utile?

La solution

I'm not sure your question is entirely clear, but I think you're asking how much data can be transferred to or from L1 cache upon execution of a single x86 instruction?

If so, it's kind of an ill posed question. The cache structure, and even caching as a concept are not part of the x86 specification. This means, that the answer depends entirely on the underlying hardware. If there's a specific processor you have in mind, you can probably find the answer in the data sheet. What you're looking for is the cache block size, since cache managers like to write and read whole blocks at a time. However, there are instructions in x86 extensions (such as AVX and SSE) that deal specifically with large memory transactions, and they can write or read the cache as much as is required/convenient.

Autres conseils

You never read from, or write to, any cache level explicitly, but under any reasonable interpretation of what you mean, it seems to me that if you're reading from L1$, the value is being read into a register, and if you're writing to L1$, the value is being written from a register, so for all practical purposes, the basic answer is always “the size of the register you're using” as the source/destination for the architectural instruction in question.

In reality, it's a little bit more complex than that, because it depends on the width of the path between the MOB (memory order buffer) and L1$, which is a feature of the particular microarchitecture. Recently, Intel CPUs (e.g. Core, Nehalem) have had 128-bit paths from MOB to L1$, but I don't know if the most recent (e.g. Haswell) have upped that to 256-bit to match the AVX register size. That's one possibility. The other is that a single architectural store of a 256-bit AVX register might decode to two 128-bit µops (micro-operations) in the back end. The latter seems more likely for Sandy and Ivy Bridge owing to the “double-ganged” use of two 128-bit execution units to achieve 256-bit AVX operations. I don't know enough about the Haswell microarchitecture to speculate about what it might do.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top