Question

Knuth recently objected to 64-bit systems, saying that for programs which fit in 4 gigs of memory, "they effectively throw away half of the cache" because the pointers are twice as big as on a 32-bit system.

My question is: can this problem be avoided by installing a 32-bit operating system on a 64-bit machine? And are there any bandwidth-intensive benchmarks which demonstrate the advantage in this case?

Was it helpful?

Solution

The answer is: yes it can to a certain extent, although the performance difference is unlikely to be great.

Any benchmark to test this will have to do a lot of pointer resolution, which will be difficult to separate out from the noise. Designing a benchmark that will not optimise away is difficult. This article about flawed java benchmarks was posted by someone in response to another question, but many of the principles described in it will apply to this.

OTHER TIPS

Bandwidth is not really the correct term here. What Knuth was really talking about was data density, as it relates to cache footprint. Imagine that you have a 16KB L1 data cache: If you're purely storing pointers, you can store 2^14/2^2 = 2^12 = 4096 32-bit pointers, but only 2048 64-bit pointers. If the performance of your application depends on being able to keep track of over 2K different buffers, you may see a real performance benefit from a 32-bit address space. However, most real code is not this way, and real performance benefits from a caching system often come from being able to cache common integer and floating-point data structures, not huge quantities of pointers. If your working set is not pointer-heavy, the downside of 64-bit becomes negligible, and the upside becomes much more obvious if you're performing a lot of 64-bit integer arithmetic.

I don't think Knuth objected to 64-bit systems. He just said that using 64-bit pointers on a system that has less than 4GB ram is idiotic (at least if you have lots of pointers like the ones in a double-linked list). I can't say that I agree with him, here are 3 different ways that can be taken. Let's assume you have a 64-bit capable CPU that can also run in 32-bit mode like some Intel Core Duo.

1 - Everything is 32-bit, the OS, the APPZ, all of them. So you have 32-bit pointers but you can not use the extra registers/instructions that are available on 64-bit mode.

2 - Everything is 64-bit, the OS, the APPZ, all of them. So you have 64-bit pointers and you can use the extra registers/instructions that are available on 64-bit mode. But as you have less than 4GB ram, using 64-bit pointers seems like idiotic. But, is it ?

3 - OS is 64-bit and OS interestingly makes sure that all the code/data pointers are in the 0x00000000 - 0xFFFFFFFF range (Virtual Memory !!!). The ABI runs in a very strange way that all the code/data pointers kept in memory/files are 32-bit wide but they are loaded into 64-bit registers as zero-extended. If there is a code location to jump, compiler/ABI does the necessary fix-ups and does the actual 64-bit jump. This way, pointers are 32-bit but APPZ can be 64-bit meaning they can make use of the 64-bit registers and instructions. This process is something like thunking, I think ;-P

My conclusion is ::

The 3rd option seemed doable to me but it is not an easy problem. In theory it can work but I do not think it is feasible. And I also think that his quote "When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache." is exaggerated...

i've seen somewhere that the best mix (on x86 CPUs) is to use a 64-bit OS and 32-bit applications.

with a 64-bit OS you get:

  • ability to handle more than 4GB of address space
  • more, bigger registers to help in data-copying operations

with a 32-bit app you get:

  • smaller pointers
  • less, smaller registers to save on context switches

cons:

  • all libraries must be duplicated. tiny by HD space standards.
  • all loaded libraries are duplicated on RAM. not so tiny...

surprisingly, there seems not to be any overhead when switching modes. I guess that breaking from userspace to kernel costs the same, no matter the bitness of the userspace.

of course, there are some applications that benefit from big address space. but for everything else, you can get an extra 5% performance by staying at 32-bit.

and no, i don't care about this small speedup. but it doesn't "offend" me to run 32-bit FireFox on a 64-bit KUbuntu machine (like i've seen on some forums)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top