在 C 中尝试过真正的简单文件复制代码吗？

https://stackoverflow.com/questions/1006797

06-07-2019
|

题

这看起来是一个简单的问题，但我在这里没有找到类似的东西。

由于C中没有文件复制功能，所以我们必须自己实现文件复制，但我不喜欢重新发明轮子，即使是这样的小事，所以我想问云：

您建议使用 fopen()/fread()/fwrite() 复制文件的代码是什么？
- 您建议使用 open()/read()/write() 复制文件的代码是什么？

该代码应该是可移植的（windows/mac/linux/bsd/qnx/younameit）、稳定、经过时间考验、快速、内存高效等。欢迎进入特定系统的内部结构以压缩更多性能（例如获取文件系统集群大小）。

这似乎是一个微不足道的问题，但是，例如，CP 命令的源代码不是 10 行 C 代码。

解决方案

就实际的 I/O 而言，我以各种形式编写了一百万次用于将数据从一个流复制到另一个流的代码是这样的。成功时返回 0，出错时返回 -1，并设置 errno（在这种情况下，可能会复制任意数量的字节）。

请注意，对于复制常规文件，您可以跳过 EAGAIN 内容，因为常规文件始终会阻塞 I/O。但不可避免的是，如果您编写此代码，有人会在其他类型的文件描述符上使用它，因此请将其视为免费赠品。

GNU 有一个特定于文件的优化 cp 确实如此，我在这里没有打扰，对于 0 字节的长块，您只需通过查找末尾来扩展输出文件，而不是写入。

void block(int fd, int event) {
    pollfd topoll;
    topoll.fd = fd;
    topoll.events = event;
    poll(&topoll, 1, -1);
    // no need to check errors - if the stream is bust then the
    // next read/write will tell us
}

int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
    for(;;) {
       void *pos;
       // read data to buffer
       ssize_t bytestowrite = read(fdin, buf, bufsize);
       if (bytestowrite == 0) break; // end of input
       if (bytestowrite == -1) {
           if (errno == EINTR) continue; // signal handled
           if (errno == EAGAIN) {
               block(fdin, POLLIN);
               continue;
           }
           return -1; // error
       }

       // write data from buffer
       pos = buf;
       while (bytestowrite > 0) {
           ssize_t bytes_written = write(fdout, pos, bytestowrite);
           if (bytes_written == -1) {
               if (errno == EINTR) continue; // signal handled
               if (errno == EAGAIN) {
                   block(fdout, POLLOUT);
                   continue;
               }
               return -1; // error
           }
           bytestowrite -= bytes_written;
           pos += bytes_written;
       }
    }
    return 0; // success
}

// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux 
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
    #define FILECOPY_BUFFER_SIZE (64*1024)
#endif

int copy_data(int fdin, int fdout) {
    // optional exercise for reader: take the file size as a parameter,
    // and don't use a buffer any bigger than that. This prevents 
    // memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
    // is small.
    for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
        void *buffer = malloc(bufsize);
        if (buffer != NULL) {
            int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
            free(buffer);
            return result;
        }
    }
    // could use a stack buffer here instead of failing, if desired.
    // 128 bytes ought to fit on any stack worth having, but again
    // this could be made configurable.
    return -1; // errno is ENOMEM
}

要打开输入文件：

int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;

打开输出文件很棘手。作为基础，您需要：

int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
    close(fdin);
    return -1;
}

但存在一些令人困惑的因素：

当文件相同时，您需要特殊情况，我不记得如何移植。
如果输出文件名是目录，您可能需要将文件复制到该目录中。
如果输出文件已经存在（使用 O_EXCL 打开以确定这一点并在错误时检查 EEXIST），您可能需要执行不同的操作，如 cp -i 做。
您可能希望输出文件的权限反映输入文件的权限。
您可能希望复制其他特定于平台的元数据。
您可能希望也可能不希望在出错时取消链接输出文件。

显然，所有这些问题的答案可能是“做同样的事情” cp”。在这种情况下，原始问题的答案是“忽略我或其他任何人所说的一切，并使用 cp".

顺便说一句，获取文件系统的簇大小几乎没有用。在超过磁盘块的大小之后很长一段时间，您几乎总是会看到速度随着缓冲区大小的增加而增加。

其他提示

这是我需要使用测试工具从一个文件复制到另一个文件时使用的函数：

/*
@(#)File:           $RCSfile: fcopy.c,v $
@(#)Version:        $Revision: 1.11 $
@(#)Last changed:   $Date: 2008/02/11 07:28:06 $
@(#)Purpose:        Copy the rest of file1 to file2
@(#)Author:         J Leffler
@(#)Modified:       1991,1997,2000,2003,2005,2008
*/

/*TABSTOP=4*/

#include "jlss.h"
#include "stderr.h"

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "@(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp $";
#endif /* lint */

void fcopy(FILE *f1, FILE *f2)
{
    char            buffer[BUFSIZ];
    size_t          n;

    while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
    {
        if (fwrite(buffer, sizeof(char), n, f2) != n)
            err_syserr("write failed\n");
    }
}

#ifdef TEST

int main(int argc, char **argv)
{
    FILE *fp1;
    FILE *fp2;

    err_setarg0(argv[0]);
    if (argc != 3)
        err_usage("from to");
    if ((fp1 = fopen(argv[1], "rb")) == 0)
        err_syserr("cannot open file %s for reading\n", argv[1]);
    if ((fp2 = fopen(argv[2], "wb")) == 0)
        err_syserr("cannot open file %s for writing\n", argv[2]);
    fcopy(fp1, fp2);
    return(0);
}

#endif /* TEST */

显然，这个版本使用来自标准 I/O 的文件指针而不是文件描述符，但它相当高效并且尽可能可移植。

好吧，除了误差函数——这对我来说很奇怪。只要干净地处理错误，就应该没问题。这 "jlss.h" 标头声明 fcopy();这 "stderr.h" 标头声明 err_syserr() 以及许多其他类似的错误报告功能。该函数的一个简单版本如下 - 真正的函数添加了程序名称并执行一些其他操作。

#include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

void err_syserr(const char *fmt, ...)
{
    int errnum = errno;
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    if (errnum != 0)
        fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
    exit(1);
}

上面的代码可能被视为具有现代 BSD 许可证或 GPL v3，由您选择。

每次读取的大小需要是 512（扇区大小）的倍数，4096 是一个不错的选择

这是一个非常简单明了的例子：复制文件. 。由于它是用 ANSI-C 编写的，没有任何特定的函数调用，我认为它非常可移植。

根据复制文件的含义，这肯定不是一件小事。如果您的意思只是复制内容，那么几乎没有什么可做的。但通常，您需要复制文件的元数据，这肯定取决于平台。我不知道有哪个 C 库可以以可移植的方式实现您想要的功能。如果您关心可移植性，那么仅处理文件名本身就不是小事。

在C++中，有文件库促进

我在实现自己的文件副本时发现了一件事，它看起来很明显，但事实并非如此：I/O 是慢的. 。您几乎可以通过执行这些操作的次数来计算副本的速度。很明显，您需要尽可能少地执行这些操作。

我发现最好的结果是当我给自己一个巨大的缓冲区时，通过一次 I/O 将整个源文件读入其中，然后通过一次 I/O 将整个缓冲区写回。如果我必须分 10 批进行，速度就会变得很慢。尝试读取和写出每个字节（就像天真的编码员可能首先尝试的那样）非常痛苦。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow