What is the motivation for casting a pointer into a integer?

https://softwareengineering.stackexchange.com/questions/290574

09-10-2020
|

Question

I'm doing some changes in the Linux kernel code and have noticed a pointer being cast into integer.

Check out buf below (full code):

snd_pcm_sframes_t snd_pcm_lib_read(struct snd_pcm_substream *substream, void __user *buf, snd_pcm_uframes_t size)
{
    ....
    return snd_pcm_lib_read1(substream, (unsigned long)buf, size, nonblock, snd_pcm_lib_read_transfer);
}

It gets covered from void __user *buf to (unsigned long)buf. Afterwards in the code buf is treated as a long and not as a pointer.

Even though on the technical levels they are both numbers in memory, so far I've perceived conceptually as different things. Is there a low level pattern on when this is used?

Solution

A typical use case (in hosted user-code applications) to cast some pointer to an intptr_t (from <stdint.h> standard C99 or C11 header) is to compute some hash code on that pointer:

uint32_t hash_of_foo_ptr(struct foo_st *foo) {
   return (uint32_t)(((intptr_t)foo) * 613) ^ ((intptr_t)foo % 51043));
}

For historical reasons (i.e. Linux came before C99), the Linux kernel uses unsigned long instead of uintptr_t (the unsigned integral type of the same sizeof as pointers) or intptr_t

Also, user-space to kernel-space transmissions (e.g. arguments of syscalls) is in the Linux kernel in terms of long or unsigned long (again, you'll better think in term of intptr_t or uintptr_t). Even when you transmit addresses from user-space to kernel-space, you need to think about address spaces and virtual memory (and it is becoming complex, since user-space and kernel-code live in different address spaces).

A quote on the topic from LDD3:

Although, conceptually, addresses are pointers, memory administration is often better accomplished by using an unsigned integer type; the kernel treats physical memory like a huge array, and a memory address is just an indexinto the array. Furthermore, a pointer is easily dereferenced; when dealing directly with memory addresses, you almost never want to dereference them in this manner. Using an integer type prevents this dereferencing, thus avoiding bugs. Therefore, generic memory addresses in the kernel are usually unsigned long , exploiting the fact that pointers and long integers are always the same size, at least on all the platforms currently supported by Linux.

Note that the Linux kernel is not coded in hosted academically standard portable C99. It is a freestanding program, coded with a few particular C compilers in mind (GCC and perhaps also in the future Clang/LLVM...).

The Linux kernel is using some GCC extensions to C (e.g. __attribute__-s, builtins, computed goto-s, etc...), perhaps wrapped with macros. Most of these extensions are also supported by Clang / LLVM.

Kernel newbies & lkml are probably good places to ask about that.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange