Question

I'm trying to use the Linux system call sendfile() to copy a file using threads.

I'm interested in optimizing these parts of the code:

fseek(fin, size * (number) / MAX_THREADS, SEEK_SET);  
fseek(fout, size * (number) / MAX_THREADS, SEEK_SET); 
/* ... */
fwrite(buff, 1, len, fout);  

Code:

void* FileOperate::FileCpThread::threadCp(void *param)
{
    Info *ft = (Info *)param;
    FILE *fin = fopen(ft->fromfile, "r+");
    FILE *fout = fopen(ft->tofile, "w+");

    int size = getFileSize(ft->fromfile);

    int number =  ft->num;
    fseek(fin, size * (number) / MAX_THREADS, SEEK_SET);
    fseek(fout, size * (number) / MAX_THREADS, SEEK_SET);

    char buff[1024] = {'\0'};
    int len = 0;
    int total = 0;

    while((len = fread(buff, 1, sizeof(buff), fin)) > 0)
    {
        fwrite(buff, 1, len, fout);
        total += len;

        if(total > size/MAX_THREADS)
        {
            break;
        }
    }

    fclose(fin);
    fclose(fout);
}
Was it helpful?

Solution

File copying is not CPU bound; if it were you're likely to find that the limitation is at the kernel level and nothing you can do at the user leve would parallelize it.

Such "improvements" done on mechanical drives will in fact degrade the throughput. You're wasting time seeking along the file instead of reading and writing it.

If the file is long and you don't expect to need the read or written data anytime soon, it might be tempting to use the O_DIRECT flag on open. That's a bad idea, since the O_DIRECT API is essentially broken by design.

Instead, you should use posix_fadvise on both source and destination files, with POSIX_FADV_SEQUENTIAL and POSIX_FADV_NOREUSE flags. After the write (or sendfile) call is finished, you need to advise that the data is not needed anymore - pass POSIX_FADV_DONTNEED. That way the page cache will only be used to the extent needed to keep the data flowing, and the pages will be recycled as soon as the data has been consumed (written to disk).

The sendfile will not push file data over to the user space, so it further relaxes some of the pressure from memory and processor cache. That's about the only other sensible improvement you can make for copying of files that's not device-specific.

Choosing a sensible chunk size is also desirable. Given that modern drives push over a 100Mbytes/s, you might want to push a megabyte at a time, and always a multiple of the 4096 byte page size - thus (4096*256) is a decent starting chunk size to handle in a single sendfile or read/write calls.

Read parallelization, as you propose it, only makes sense on RAID 0 volumes, and only when both the input and output files straddle the physical disks. You can then have one thread per the lesser of the number of source and destination volume physical disks straddled by the file. That's only necessary if you're not using asynchronous file I/O. With async I/O you wouldn't need more than one thread anyway, especially not if the chunk sizes are large (megabyte+) and the single-thread latency penalty is negligible.

There's no sense for parallelization of a single file copy on SSDs, unless you were on some very odd system indeed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top