Вопрос

I am using pread to obtain huge amount of data at one time.

but If I try to gather a huge amount of data (for instance 100mb) and save it into an array I get a segfault....

is there a hard limit on the max number of bytes a pread can read?

#define          _FILE_OFFSET_BITS                            64 
#define          BLKGETSIZE64                                _IOR(0x12,114,size_t)
#define          _POSIX_C_SOURCE                             200809L

#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

int readdata(int fp,uint64_t seekpoint, uint64_t seekwidth) {
    int16_t  buf[seekwidth];

    if (pread(fp,buf,seekwidth,seekpoint)==seekwidth) {
        printf("SUCCES READING AT: %"PRIu64"| WITH READ WIDTH: %"PRIu64"\n",seekpoint,seekwidth);
        return 1;
    } else {
        printf("ERROR READING AT: %"PRIu64"| WITH READ WIDTH: %"PRIu64"\n",seekpoint,seekwidth);
        return 2;
    }

}





int main() {

    uint64_t    readwith,
                offset;
    int         fp=open("/dev/sdc",O_RDWR);

    readwith=10000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=100000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=1000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=10000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=10000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=100000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=1000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=10000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=100000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=1000000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=10000000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=100000000000000;   offset=0;
    readdata(fp,offset,readwith);
    readwith=1000000000000000;   offset=0;
    readdata(fp,offset,readwith);

    close(fp);

}
Это было полезно?

Решение

There is no hard limit on the maximum number of bytes that pread can read. However, reading that large an amount of data in one contiguous block is probably a bad idea. There are a few alternatives I'll describe later.

In your particular case, the problem is likely that you are trying to stack allocate the buffer. There is a limited amount of space available to the stack; if you run cat /proc/<pid>/limits, you can see what that is for a particular process (or just cat /proc/self/limits to check for the shell that you're running). In my case, that happens to be 8388608, or 8 MB. If you try to use more than this limit, you will get a segfault.

You can increase the maximum stack size using setrlimit or the Linux-specific prlimit, but that's generally not considered something good to do; your stack is something that is permanently allocated to each thread, so increasing the size increases how much address space each thread has allocated to it. On a 32 bit system (which are becoming less relevant, but there are still 32 bit systems out there, or 32 bit applications on 64 bit systems), this address space is actually fairly limited, so just a few threads with a large amount of stack space allocated could exhaust your address space. It would be better to take an alternate approach.

One such alternate approach is to use malloc to dynamically allocate your buffer. Then you will only use this space when you need it, not all the time for your whole stack. Yes, you do have to remember to free the data afterwards, but that's not all that hard with a little bit of careful thought in your programming.

Another approach, that can be good for large amounts of data like this, is to use mmap to map the file into your address space instead of trying to read the whole thing into a buffer. What this does is allocate a region of address space, and any time you access that address space, the data will be read from that file to populate the page that you are reading from. This can be very handy when you want random access to the file, but will not actually be reading the whole thing, you will be instead skipping around the file. Only the pages that you actually access will be read, rather than wasting time reading the whole file into a buffer and then accessing only portions of it.

If you use mmap, you will need to remember to munmap the address space afterwards, though if you're on a 64 bit system, it matters a lot less if you remember to munmap than it does if you remember to free allocated memory (on a 32 bit system, address space is actually at a premium, so leaving around large mappings can still cause problems). mmap will only use up address space, not actual memory; since it's backed by a file, if there's memory pressure the kernel can just write out any dirty data to disk and stop keeping the contents around in memory, while for an allocated buffer, it needs to actually preserve the data in RAM or swap space, which are generally fairly limited resources. And if you're just using it to read data, it doesn't even have to flush out dirty data to disk, it can just free-up the page and reuse it, and if you access the page again, it will read it back in.

If you don't need random access to all of that data at once, it's probably better to just read and process the data in smaller chunks, in a loop. Then you can use stack allocation for its simplicity, without worrying about increasing the amount of address space allocated to your stack.

edit to add: Based on your sample code and other question, you seem to be trying to read an entire 2TB disk as a single array. In this case, you will definitely need to use mmap, as you likely don't have enough RAM to hold the entire contents in memory. Here's an example; note that this particular example is specific to Linux:

#include <stdio.h>
#include <err.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include <unistd.h>

int main(int argc, char **argv) {
    if (argc != 2)
        errx(1, "Wrong number of arguments");

    int fd = open(argv[1], O_RDONLY);
    if (fd < 0)
        err(2, "Failed to open %s", argv[1]);

    struct stat statbuf;
    if (fstat(fd, &statbuf) != 0)
        err(3, "Failed to stat %s", argv[1]);

    size_t size;
    if (S_ISREG(statbuf.st_mode)) {
        size = statbuf.st_size;
    } else if (S_ISBLK(statbuf.st_mode)) {
        if (ioctl(fd, BLKGETSIZE64, &size) != 0)
            err(4, "Failed to get size of block device %s", argv[1]);
    }

    printf("Size: %zd\n", size);

    char *mapping = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
    if (MAP_FAILED == mapping)
        err(5, "Failed to map %s", argv[1]);

    /* do something with `mapping` */

    munmap(mapping, statbuf.st_size);

    return 0;
}
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top