Question

I'm trying to generate large files (4-8 GB) with C code. Now I use fopen() with 'wb' parameters to open file binary and fwrite() function in for loop to write bytes to file. I'm writing one byte in every loop iteration. There is no problem until the file is larger or equal to 4294967296 bytes (4096 MB). It looks like some memory limit in 32-bit OS, because when it writes to that opened file, it is still in RAM. Am I right? The symptom is that the created file has smaller size than I want. The difference is 4096 MB, e.g. when I want 6000 MB file, it creates 6000 MB - 4096 MB = 1904 MB file.

Could you suggest other way to do that task?

Regards :)

Part of code:

unsigned long long int number_of_data = (unsigned int)atoi(argv[1])*1024*1024; //MB
char x[1]={atoi(argv[2])};

fp=fopen(strcat(argv[3],".bin"),"wb");

    for(i=0;i<number_of_data;i++) {
        fwrite(x, sizeof(x[0]), sizeof(x[0]), fp);
    }

fclose(fp);
Was it helpful?

Solution

fwrite is not the problem here. The problem is the value you are calculating for number_of_data.

You need to be careful of any unintentional 32-bit casting when dealing with 64-bit integers. When I define them, I normally do it in a number of discrete steps, being careful at each step:

unsigned long long int number_of_data = atoi(argv[1]); // Should be good for up to 2,147,483,647 MB (2TB)
number_of_data *= 1024*1024; // Convert to MB

The assignment operator (*=) will be acting on the l-value (the unsigned long long int), so you can trust it to be acting on a 64-bit value.

This may look unoptimised, but a decent compiler will remove any unnecessary steps.

OTHER TIPS

You should not have any problem creating large files on Windows but I have noticed that if you use a 32 bit version of seek on the file it then seems to decide it is a 32 bit file and thus cannot be larger that 4GB. I have had success using _open, _lseeki64 and _write when working with >4GB files on Windows. For instance:

static void
create_file_simple(const TCHAR *filename, __int64 size)
{
    int omode = _O_WRONLY | _O_CREAT | _O_TRUNC;
    int fd = _topen(filename, omode, _S_IREAD | _S_IWRITE);
    _lseeki64(fd, size, SEEK_SET);
    _write(fd, "ABCD", 4);
    _close(fd);
}

The above will create a file over 4GB without issue. However, it can be slow as when you call _write() there the file system has to actually allocate the disk blocks for you. You may find it faster to create a sparse file if you have to fill it up randomly. If you will fill the file sequentially from the beginning then the above code will be fine. Note that if you really want to use the buffered IO provided by fwrite you can obtain a FILE* from a C library file descriptor using fdopen().

(In case anyone is wondering, the TCHAR, _topen and underscore prefixes are all MSVC++ quirks).

UPDATE

The original question is using sequential output for N bytes of value V. So a simple program that should actually produce the file desired is:

#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <io.h>
#include <tchar.h>
int
_tmain(int argc, TCHAR *argv[])
{
    __int64 n = 0, r = 0, size = 0x100000000LL; /* 4GB */
    char v = 'A';
    int fd = _topen(argv[1], _O_WRONLY | _O_CREAT| _O_TRUNC, _S_IREAD | _S_IWRITE);
    while (r != -1 && n < count) {
        r = _write(fd, &v, sizeof(value));
        if (r >= 0) n += r;
    }
    _close(fd);
    return 0;
}

However, this will be really slow as we are only writing one byte at a time. That is something that can be improved by using a larger buffer or using buffered I/O by calling fdopen on the descriptor (fd) and switching to fwrite.

Yuo have no problem with fwrite(). The problem seems to be your

unsigned long long int number_of_data = (unsigned int)atoi(argv[1])*1024*1024; //MB

which indeed should be rather something like

uint16_t number_of_data = atoll(argv[1])*1024ULL*1024ULL;

unsigned long long would still be ok, but unsigned int * int * int will give you a unsinged int no matter how large your target variable is.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top