MATLAB twice as fast as Numpy

Question 1

That comparison ends up being apples to oranges due to caching, because it is more efficient to transfer or do some work on contiguous chunks of memory. This particular benchmark is memory bound, since in fact no computation is done, and thus the percentage of cache hits is key to achieve good performance.

Matlab lays the data in column-major order (Fortran order), so a(:,:,k) is a contiguous chunk of memory, which is fast to copy.

Numpy defaults to row-major order (C order), so in a[:,:,k] there are big jumps between elements and that slows down the memory transfer. Actually, the data layout can be chosen. In my laptop, creating the array with a = np.asfortranarray(np.random.rand(5000,5000,3)) leds to a 5x speed up (1 s vs 0.19 s).

This result should be very similar both for numpy-MKL and plain numpy because MKL is a fast LAPACK implementation and here you're not calling any function that uses it (MKL definitely helps when solving linear systems, computing dot products...).

I don't really know what's going on on the Gauss Seidel solver, but some time ago I wrote an answer to a question titled Numpy running at half the speed of MATLAB that talks a little bit about MKL, FFT and Matlab's JIT.

Question 2

You are attempting to recreate the NASA experiment, however you have changed many of the variables. For example:

Your hardware and operating system is different (www.nccs.nasa.gov/dali_front.html)
Your Python version is different (2.5.3 vs 3.3)
Your MATLAB version is different (2008 vs 2012)

Assuming the NASA results are correct, the difference in results is due to one or more of these changed variables. I recommend you:

Retest with the SciPy prebuilt binaries.
Research if any improvements were made to MATLAB relative to this type of calculation.

Also, you may find this link useful.