CUDA : why are we using so many kinds of memories? [closed]

Question

The main reasons for having multiple kinds of memory are explained in this article: Wikipedia: Memory Hierarchy

To summarize it, it a very simplified form:

It is usually the case that the larger the memory is, the slower it is
Memory can be read and written faster when it is "closer" to the processor.

As mentioned in the comment: On the CPU, you also have several layers of memory: The main memory, and several levels of caches. These caches are much smaller than main memory, but much faster. These caches are managed by the hardware, so as a software developer, you do not directly notice that these caches exist at all. All the data seems to be in the main memory.

On the GPU, you have to manage this memory manually (althogh in newer CUDA versions, you can also declare the shared memory as "cache", and let CUDA take care of the data management).

For example, reading some data from the shared memory in CUDA may be done within a few NANOseconds. Reading data from global memory may take a few MICROseconds. One of the keys to high performance in CUDA is thus data locality: You should try to keep the data that you are working on in local or shared memory, and avoid reading/writing data in global memory.

(P.S.: The "Close" votes that mark this question as "Primarily Opinion Based" are somewhat ridiculous. The question may show a lack of own research, but is a reasonable question that can clearly be answered here)