Question

According to wiki: Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor.

But it is not clear whether it is about any memory including caches or about main memory only.

For example Xeon Phi processor have next architecture: enter image description here

Memory access to main memory (GDDR) is same for all cores. Meanwhile memory access to L2 cache is different for different cores, since first native L2 cache is checked, then L2 cache of other cores is checked via ring. Is it NUMA or UMA architecture?

Was it helpful?

Solution

Technically, NUMA should probably only be used to describe non-uniform access latency or bandwidth to main memory. (If the NUMA factor [latency far/latency near or bandwidth far/bandwidth near] is small [e.g., comparable to dynamic variability due to DRAM row misses, buffering, etc.], then the system might still be considered UMA.)

(Technically, the Xeon Phi has a small but non-zero NUMA factor since each hop on the ring interconnect takes time [a core might be only one hop from one memory controller and several hops from the most distant one].)

The term NUCA (Non-Uniform Cache Access) has been taken to describe a single cache with different latency of access for different cache blocks. A shared cache level with portions more closely tied to a core or cluster of cores would also fall under NUCA, but separate cache hierarchies would (I believe) not justify the term (even though snooping might find a desired cache block in a 'remote' cache).

I do not know of any term being used to describe a system with variable cache latency associated with snooping (i.e., with separate cache hierarchies) and a small/zero NUMA factor.

(Since caches can transparently replicate and migrate cache blocks, the NUMA concept is a little less fitting. [Yes, an OS can migrate and replicate pages transparently to application software in a NUMA system, so this difference is not absolute.])

Perhaps somewhat interestingly, Azul Systems claims UMA across sockets for its Vega systems:

Azul builds are gear as ‘UMA’ because our programs do not have well understood access patterns. Instead, the patterns are mostly random (after cache filtering) and so it makes sense to have uniform mediocre speeds instead of 1/16th of memory fast and 15/16ths as slow.

OTHER TIPS

I would definitely go for UMA on that one. Having looked at the NUMA Wiki as yourself, the Wiki page for UMA actually gives a little more detail:

In the UMA architecture, each processor may use a private cache. WIKI

This is why I would go for UMA, also because, as you state, once L2 is missed, it goes to a general Memory Directory link and finds the correct physical page using the Memory Controller (MC in your picture)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top