Getting the highest allocated file descriptor

https://stackoverflow.com/questions/899038

23-08-2019
|

Question

Is there a portable way (POSIX) to get the highest allocated file descriptor number for the current process?

I know that there's a nice way to get the number on AIX, for example, but I'm looking for a portable method.

The reason I'm asking is that I want to close all open file descriptors. My program is a server which runs as root and forks and execs child programs for non-root users. Leaving the privileged file descriptors open in the child process is a security problem. Some file descriptors may be opened by code I cannot control (the C library, third party libraries, etc.), so I cannot rely on FD_CLOEXEC either.

Solution

While portable, closing all file descriptors up to sysconf(_SC_OPEN_MAX) is not reliable, because on most systems this call returns the current file descriptor soft limit, which could have been lowered below the highest used file descriptor. Another issue is that on many systems sysconf(_SC_OPEN_MAX) may return INT_MAX, which can cause this approach to be unacceptably slow. Unfortunately, there is no reliable, portable alternative that does not involve iterating over every possible non-negative int file descriptor.

Although not portable, most operating systems in common use today provide one or more of the following solutions to this problem:

A library function to close all file descriptors >= fd. This is the simplest solution for the common case of closing all file descriptors, although it cannot be used for much else. To close all file descriptors except for a certain set, dup2 can be used to move them to the low end beforehand, and to move them back afterward if necessary.
- closefrom(fd) (Solaris 9 or later, FreeBSD 7.3 or 8.0 and later, NetBSD 3.0 or later, OpenBSD 3.5 or later.)
- fcntl(fd, F_CLOSEM, 0) (AIX, IRIX, NetBSD)
A library function to provide the maximum file descriptor currently in use by the process. To close all file descriptors above a certain number, either close all of them up to this maximum, or continually get and close the highest file descriptor in a loop until the low bound is reached. Which is more efficient depends on the file descriptor density.
- fcntl(0, F_MAXFD) (NetBSD)
- pstat_getproc(&ps, sizeof(struct pst_status), (size_t)0, (int)getpid())
  Returns information about the process, including the highest file descriptor currently open in ps.pst_highestfd. (HP-UX)
A directory containing an entry for each open file descriptor. This is the most flexible approach as it allows for closing all file descriptors, finding the highest file descriptor, or doing just about anything else on every open file descriptor, even those of another process (on most systems). However this can be more complicated than the other approaches for the common uses. Also, it can fail for a variety of reasons such as proc/fdescfs not mounted, a chroot environment, or no file descriptors available to open the directory (process or system limit). Therefore use of this approach is often combined with a fallback mechanism. Example (OpenSSH), another example (glib).
- /proc/pid/fd/ or /proc/self/fd/ (Linux, Solaris, AIX, Cygwin, NetBSD)
  (AIX does not support "self")
- /dev/fd/ (FreeBSD, Darwin, OS X)
It can be difficult to handle all corner cases reliably with this approach. For example consider the situation where all file descriptors >= fd are to be closed, but all file descriptors < fd are used, the current process resource limit is fd, and there are file descriptors >= fd in use. Because the process resource limit has been reached the directory cannot be opened. If closing every file descriptor from fd through the resource limit or sysconf(_SC_OPEN_MAX) is used as a fallback, nothing will be closed.

OTHER TIPS

The POSIX way is:

int maxfd=sysconf(_SC_OPEN_MAX);
for(int fd=3; fd<maxfd; fd++)
    close(fd);

(note that's closing from 3 up, to keep stdin/stdout/stderr open)

close() harmlessly returns EBADF if the file descriptor is not open. There's no need to waste another system call checking.

Some Unixes support a closefrom(). This avoids the excessive number of calls to close() depending on the maximum possible file descriptor number. While the best solution I'm aware of, it's completely nonportable.

I've written code to deal with all platform-specific features. All functions are async-signal safe. Thought people might find this useful. Only tested on OS X right now, feel free to improve/fix.

// Async-signal safe way to get the current process's hard file descriptor limit.
static int
getFileDescriptorLimit() {
    long long sysconfResult = sysconf(_SC_OPEN_MAX);

    struct rlimit rl;
    long long rlimitResult;
    if (getrlimit(RLIMIT_NOFILE, &rl) == -1) {
        rlimitResult = 0;
    } else {
        rlimitResult = (long long) rl.rlim_max;
    }

    long result;
    if (sysconfResult > rlimitResult) {
        result = sysconfResult;
    } else {
        result = rlimitResult;
    }
    if (result < 0) {
        // Both calls returned errors.
        result = 9999;
    } else if (result < 2) {
        // The calls reported broken values.
        result = 2;
    }
    return result;
}

// Async-signal safe function to get the highest file
// descriptor that the process is currently using.
// See also http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor
static int
getHighestFileDescriptor() {
#if defined(F_MAXFD)
    int ret;

    do {
        ret = fcntl(0, F_MAXFD);
    } while (ret == -1 && errno == EINTR);
    if (ret == -1) {
        ret = getFileDescriptorLimit();
    }
    return ret;

#else
    int p[2], ret, flags;
    pid_t pid = -1;
    int result = -1;

    /* Since opendir() may not be async signal safe and thus may lock up
     * or crash, we use it in a child process which we kill if we notice
     * that things are going wrong.
     */

    // Make a pipe.
    p[0] = p[1] = -1;
    do {
        ret = pipe(p);
    } while (ret == -1 && errno == EINTR);
    if (ret == -1) {
        goto done;
    }

    // Make the read side non-blocking.
    do {
        flags = fcntl(p[0], F_GETFL);
    } while (flags == -1 && errno == EINTR);
    if (flags == -1) {
        goto done;
    }
    do {
        fcntl(p[0], F_SETFL, flags | O_NONBLOCK);
    } while (ret == -1 && errno == EINTR);
    if (ret == -1) {
        goto done;
    }

    do {
        pid = fork();
    } while (pid == -1 && errno == EINTR);

    if (pid == 0) {
        // Don't close p[0] here or it might affect the result.

        resetSignalHandlersAndMask();

        struct sigaction action;
        action.sa_handler = _exit;
        action.sa_flags   = SA_RESTART;
        sigemptyset(&action.sa_mask);
        sigaction(SIGSEGV, &action, NULL);
        sigaction(SIGPIPE, &action, NULL);
        sigaction(SIGBUS, &action, NULL);
        sigaction(SIGILL, &action, NULL);
        sigaction(SIGFPE, &action, NULL);
        sigaction(SIGABRT, &action, NULL);

        DIR *dir = NULL;
        #ifdef __APPLE__
            /* /dev/fd can always be trusted on OS X. */
            dir = opendir("/dev/fd");
        #else
            /* On FreeBSD and possibly other operating systems, /dev/fd only
             * works if fdescfs is mounted. If it isn't mounted then /dev/fd
             * still exists but always returns [0, 1, 2] and thus can't be
             * trusted. If /dev and /dev/fd are on different filesystems
             * then that probably means fdescfs is mounted.
             */
            struct stat dirbuf1, dirbuf2;
            if (stat("/dev", &dirbuf1) == -1
             || stat("/dev/fd", &dirbuf2) == -1) {
                _exit(1);
            }
            if (dirbuf1.st_dev != dirbuf2.st_dev) {
                dir = opendir("/dev/fd");
            }
        #endif
        if (dir == NULL) {
            dir = opendir("/proc/self/fd");
            if (dir == NULL) {
                _exit(1);
            }
        }

        struct dirent *ent;
        union {
            int highest;
            char data[sizeof(int)];
        } u;
        u.highest = -1;

        while ((ent = readdir(dir)) != NULL) {
            if (ent->d_name[0] != '.') {
                int number = atoi(ent->d_name);
                if (number > u.highest) {
                    u.highest = number;
                }
            }
        }
        if (u.highest != -1) {
            ssize_t ret, written = 0;
            do {
                ret = write(p[1], u.data + written, sizeof(int) - written);
                if (ret == -1) {
                    _exit(1);
                }
                written += ret;
            } while (written < (ssize_t) sizeof(int));
        }
        closedir(dir);
        _exit(0);

    } else if (pid == -1) {
        goto done;

    } else {
        do {
            ret = close(p[1]);
        } while (ret == -1 && errno == EINTR);
        p[1] = -1;

        union {
            int highest;
            char data[sizeof(int)];
        } u;
        ssize_t ret, bytesRead = 0;
        struct pollfd pfd;
        pfd.fd = p[0];
        pfd.events = POLLIN;

        do {
            do {
                // The child process must finish within 30 ms, otherwise
                // we might as well query sysconf.
                ret = poll(&pfd, 1, 30);
            } while (ret == -1 && errno == EINTR);
            if (ret <= 0) {
                goto done;
            }

            do {
                ret = read(p[0], u.data + bytesRead, sizeof(int) - bytesRead);
            } while (ret == -1 && ret == EINTR);
            if (ret == -1) {
                if (errno != EAGAIN) {
                    goto done;
                }
            } else if (ret == 0) {
                goto done;
            } else {
                bytesRead += ret;
            }
        } while (bytesRead < (ssize_t) sizeof(int));

        result = u.highest;
        goto done;
    }

done:
    if (p[0] != -1) {
        do {
            ret = close(p[0]);
        } while (ret == -1 && errno == EINTR);
    }
    if (p[1] != -1) {
        do {
            close(p[1]);
        } while (ret == -1 && errno == EINTR);
    }
    if (pid != -1) {
        do {
            ret = kill(pid, SIGKILL);
        } while (ret == -1 && errno == EINTR);
        do {
            ret = waitpid(pid, NULL, 0);
        } while (ret == -1 && errno == EINTR);
    }

    if (result == -1) {
        result = getFileDescriptorLimit();
    }
    return result;
#endif
}

void
closeAllFileDescriptors(int lastToKeepOpen) {
    #if defined(F_CLOSEM)
        int ret;
        do {
            ret = fcntl(lastToKeepOpen + 1, F_CLOSEM);
        } while (ret == -1 && errno == EINTR);
        if (ret != -1) {
            return;
        }
    #elif defined(HAS_CLOSEFROM)
        closefrom(lastToKeepOpen + 1);
        return;
    #endif

    for (int i = getHighestFileDescriptor(); i > lastToKeepOpen; i--) {
        int ret;
        do {
            ret = close(i);
        } while (ret == -1 && errno == EINTR);
    }
}

Right when your program started and hasn't opened anything. E.g. like the start of main(). pipe and fork immediately starting an executer server. This way it's memory and other details is clean and you can just give it things to fork & exec.

#include <unistd.h>
#include <stdio.h>
#include <memory.h>
#include <stdlib.h>

struct PipeStreamHandles {
    /** Write to this */
    int output;
    /** Read from this */
    int input;

    /** true if this process is the child after a fork */
    bool isChild;
    pid_t childProcessId;
};

PipeStreamHandles forkFullDuplex(){
    int childInput[2];
    int childOutput[2];

    pipe(childInput);
    pipe(childOutput);

    pid_t pid = fork();
    PipeStreamHandles streams;
    if(pid == 0){
        // child
        close(childInput[1]);
        close(childOutput[0]);

        streams.output = childOutput[1];
        streams.input = childInput[0];
        streams.isChild = true;
        streams.childProcessId = getpid();
    } else {
        close(childInput[0]);
        close(childOutput[1]);

        streams.output = childInput[1];
        streams.input = childOutput[0];
        streams.isChild = false;
        streams.childProcessId = pid;
    }

    return streams;
}


struct ExecuteData {
    char command[2048];
    bool shouldExit;
};

ExecuteData getCommand() {
    // maybe use json or semething to read what to execute
    // environment if any and etc..        
    // you can read via stdin because of the dup setup we did
    // in setupExecutor
    ExecuteData data;
    memset(&data, 0, sizeof(data));
    data.shouldExit = fgets(data.command, 2047, stdin) == NULL;
    return data;
}

void executorServer(){

    while(true){
        printf("executor server waiting for command\n");
        // maybe use json or semething to read what to execute
        // environment if any and etc..        
        ExecuteData command = getCommand();
        // one way is for getCommand() to check if stdin is gone
        // that way you can set shouldExit to true
        if(command.shouldExit){
            break;
        }
        printf("executor server doing command %s", command.command);
        system(command.command);
        // free command resources.
    }
}

static PipeStreamHandles executorStreams;
void setupExecutor(){
    PipeStreamHandles handles = forkFullDuplex();

    if(handles.isChild){
        // This simplifies so we can just use standard IO 
        dup2(handles.input, 0);
        // we comment this out so we see output.
        // dup2(handles.output, 1);
        close(handles.input);
        // we uncomment this one so we can see hello world
        // if you want to capture the output you will want this.
        //close(handles.output);
        handles.input = 0;
        handles.output = 1;
        printf("started child\n");
        executorServer();
        printf("exiting executor\n");
        exit(0);
    }

    executorStreams = handles;
}

/** Only has 0, 1, 2 file descriptiors open */
pid_t cleanForkAndExecute(const char *command) {
    // You can do json and use a json parser might be better
    // so you can pass other data like environment perhaps.
    // and also be able to return details like new proccess id so you can
    // wait if it's done and ask other relevant questions.
    write(executorStreams.output, command, strlen(command));
    write(executorStreams.output, "\n", 1);
}

int main () {
    // needs to be done early so future fds do not get open
    setupExecutor();

    // run your program as usual.
    cleanForkAndExecute("echo hello world");
    sleep(3);
}

If you want to do IO on the executed program the executor server will have to do socket redirects and you can use unix sockets.

Why don't you close all descriptors from 0 to, say, 10000.

It would be pretty fast, and the worst thing that would happen is EBADF.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow