Checking array size in C/C++ to avoid segmentation faults

https://softwareengineering.stackexchange.com/questions/294092

10-10-2020
|

Question

So it's well known that C does not have any array bounds checking when accessing memory. Nowadays, if you call myArray[7] when you initialised it as int myArray[3], your program will get a segfault and crash thanks to protected memory.

Now, if you have an argument in a function such as myFunc(int *yourArray), but you know you need at least 8 slots in the array, is it possible to check if myArray[7] is illegal beforehand in order to throw a custom error:

"Sorry, yourArray is too small for this function. We need 8 ints of space."

rather than

"Segmentation fault."

Solution

Checking array bounds like you want is implementation specific, because buffer overflow is an example of undefined behavior (and this explains why UB can be really bad).

It is also an undecidable problem in general. You can easily show that statically finding (by static program analysis, e.g. of the C++ source code, without actually running the program) every buffer overflow is equivalent to the halting problem. Read also about Rice's theorem.

However, several (partial) practical tools exist (notably on Linux):

you could add assert or static_assert-s in your code, and/or runtime checks.
you might find and use a static code analyzer à la Frama-C (it works for C code currently).
you could customize your GCC compiler using MELT.
You should compile your code with all warnings & debug info, e.g. g++ -Wall -Wextra -g if using GCC.
You might run your program with valgrind, at least for tests.
you could use the address sanitizer, e.g. add -fsanitize=address to your compilation flags (when testing)
notably in C (and sometimes in C++) it is a good convention to pass both array pointers and their size (like e.g. snprintf(3) or strncmp(3) do). In C, you might also use flexible array members in struct and store the flexible array's size inside the struct

BTW C and C++ pointer arithmetic abilities make finding buffer overflow even harder.

In C++11 you'll better avoid plain arrays and raw pointers and use standard containers and smart pointers.

OTHER TIPS

The answer is really fairly simple: if you want safety, use something that actually provides it--and that's not C, and not raw C-style arrays.

Without departing too far from the basic style of C and raw arrays, you can use C++ and an std::vector with [i] replaced by .at(i), and get bounds checking.

Using std::vector instead makes most of the problems with arrays easy. You can check the current size of the vector with its .size() member function. Most of the time, you don't need to do that though, because when you want to add something to it you just use its .push_back() member function.

At least in theory, you can sort of do most of the same sorts of things in C, but doing so gets relatively ugly. Although it's not terribly difficult to define a wrapper that (for example) puts a pointer and a current allocation size into a struct, you have to define functions to do all the manipulation on it, and even then you have to live with the fact that existing code won't know how to use it or deal with it. I've done this a few times, and if you need it badly enough you can make it work--but I long ago decided it just wasn't worth the pain.

A function that receives a pointer does not know of the length of the corresponding array. You must pass in as a parameter yourself explicitly:

void myFunc(int *yourArray, size_t yourArrayLen)

Once you've done that, throwing an error is trivial.

Of course, this still leaves the possibility that your caller might give you the wrong length. You can't really prevent that without either:

implementing a custom data type to store arrays and then making sure the length stays in-sync with the true length at all times using encapsulation, or
allowing static arrays only, e.g.
```
void myFunc(int (*yourArray)[8]);
```

There is no way in C(++) to get the length of an array from a pointer to its first element. (There are platform-specific functions like _msize in MSVCRT, but that only works on malloced pointers.)

What's typically done when passing arrays to functions is to pass the length along with the pointer so that bounds-checking can be done at runtime.

void myFunc(int* yourArray, int length)
{
    if (length < 8)
    {
        puts("Sorry, yourArray is too small for this function. We need 8 ints of space.");
        return;
    }

    // ...
}

void caller()
{
    int arr[LEN];
    myFunc(arr, LEN);
}

Use a custom wrapper for malloc (or write your own) that keeps additional information about the blocks it allocates. The one I use adds a few "guard bytes" to every allocation, embeds the length of the allocation as the a[-1], and checks the guard bytes and other things upon deallocation.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange