Cでコードをコピーしようとした真のシンプルなファイル？

https://stackoverflow.com/questions/1006797

06-07-2019
|

質問

これは簡単な質問のように見えますが、ここでは似たようなものは見つかりませんでした。

Cにはファイルコピー機能がないため、ファイルコピーを自分で実装する必要がありますが、そのような些細なことでもホイールを再発明するのは好きではないので、クラウドに問い合わせたいと思います：

fopen（）/ fread（）/ fwrite（）を使用したファイルコピーにはどのコードをお勧めしますか？
- open（）/ read（）/ write（）を使用してファイルをコピーするには、どのコードをお勧めしますか？

このコードは移植性（windows / mac / linux / bsd / qnx / younameit）、安定性、時間テスト済み、高速、メモリ効率などが必要です。クラスタサイズ）。

これは些細な質問のように思えますが、たとえば、CPコマンドのソースコードは10行のCコードではありません。

解決

実際のI / Oに関しては、あるストリームから別のストリームにデータをコピーするためにさまざまな形で何百回も書いたコードは、このようなものです。成功すると0を返し、エラー時にはerrnoを設定して-1を返します（この場合、任意の数のバイトがコピーされた可能性があります）。

通常のファイルは常にI / Oをブロックしているため、通常のファイルをコピーする場合は、EAGAINをスキップできます。しかし、必然的にこのコードを記述した場合、誰かが他のタイプのファイル記述子でそれを使用するので、景品と見なしてください。

GNU cp が行うファイル固有の最適化がありますが、ここでは気にしませんでしたが、0バイトの長いブロックに対しては、書き込む代わりに、終了。

void block(int fd, int event) {
    pollfd topoll;
    topoll.fd = fd;
    topoll.events = event;
    poll(&topoll, 1, -1);
    // no need to check errors - if the stream is bust then the
    // next read/write will tell us
}

int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
    for(;;) {
       void *pos;
       // read data to buffer
       ssize_t bytestowrite = read(fdin, buf, bufsize);
       if (bytestowrite == 0) break; // end of input
       if (bytestowrite == -1) {
           if (errno == EINTR) continue; // signal handled
           if (errno == EAGAIN) {
               block(fdin, POLLIN);
               continue;
           }
           return -1; // error
       }

       // write data from buffer
       pos = buf;
       while (bytestowrite > 0) {
           ssize_t bytes_written = write(fdout, pos, bytestowrite);
           if (bytes_written == -1) {
               if (errno == EINTR) continue; // signal handled
               if (errno == EAGAIN) {
                   block(fdout, POLLOUT);
                   continue;
               }
               return -1; // error
           }
           bytestowrite -= bytes_written;
           pos += bytes_written;
       }
    }
    return 0; // success
}

// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux 
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
    #define FILECOPY_BUFFER_SIZE (64*1024)
#endif

int copy_data(int fdin, int fdout) {
    // optional exercise for reader: take the file size as a parameter,
    // and don't use a buffer any bigger than that. This prevents 
    // memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
    // is small.
    for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
        void *buffer = malloc(bufsize);
        if (buffer != NULL) {
            int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
            free(buffer);
            return result;
        }
    }
    // could use a stack buffer here instead of failing, if desired.
    // 128 bytes ought to fit on any stack worth having, but again
    // this could be made configurable.
    return -1; // errno is ENOMEM
}

入力ファイルを開くには：

int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;

出力ファイルを開くのは難しいです。基礎として、次のものが必要です。

int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
    close(fdin);
    return -1;
}

しかし、交絡要因があります：

ファイルが同じ場合は特別なケースにする必要があり、移植性の高い方法を思い出せません。
出力ファイル名がディレクトリの場合、ファイルをディレクトリにコピーすることをお勧めします。
出力ファイルが既に存在する場合（O_EXCLで開いてこれを判別し、エラー時にEEXISTを確認します）、 cp -i と同様に、何か別のことをしたい場合があります。
出力ファイルの許可に入力ファイルの許可を反映させたい場合があります。
他のプラットフォーム固有のメタデータをコピーしたい場合があります。
エラー時に出力ファイルのリンクを解除する場合としない場合があります。

明らかに、これらのすべての質問に対する答えは、「 cp 」と同じにすることができます。その場合、元の質問に対する答えは＆quot;私または他の誰かが言ったことをすべて無視し、 cp ＆quot;のソースを使用します。

ところで、ファイルシステムのクラスターサイズを取得することはほとんど役に立ちません。ディスクブロックのサイズを渡した後は、ほとんどの場合、バッファーサイズが長くなるにつれて速度が向上します。

他のヒント

これは、テストハーネスを使用して、あるファイルから別のファイルにコピーする必要があるときに使用する関数です。

/*
@(#)File:           $RCSfile: fcopy.c,v $
@(#)Version:        $Revision: 1.11 $
@(#)Last changed:   $Date: 2008/02/11 07:28:06 $
@(#)Purpose:        Copy the rest of file1 to file2
@(#)Author:         J Leffler
@(#)Modified:       1991,1997,2000,2003,2005,2008
*/

/*TABSTOP=4*/

#include "jlss.h"
#include "stderr.h"

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "@(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp #include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

void err_syserr(const char *fmt, ...)
{
    int errnum = errno;
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    if (errnum != 0)
        fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
    exit(1);
}
quot;;
#endif /* lint */

void fcopy(FILE *f1, FILE *f2)
{
    char            buffer[BUFSIZ];
    size_t          n;

    while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
    {
        if (fwrite(buffer, sizeof(char), n, f2) != n)
            err_syserr("write failed\n");
    }
}

#ifdef TEST

int main(int argc, char **argv)
{
    FILE *fp1;
    FILE *fp2;

    err_setarg0(argv[0]);
    if (argc != 3)
        err_usage("from to");
    if ((fp1 = fopen(argv[1], "rb")) == 0)
        err_syserr("cannot open file %s for reading\n", argv[1]);
    if ((fp2 = fopen(argv[2], "wb")) == 0)
        err_syserr("cannot open file %s for writing\n", argv[2]);
    fcopy(fp1, fp2);
    return(0);
}

#endif /* TEST */

明らかに、このバージョンでは、ファイル記述子ではなく標準I / Oからのファイルポインターを使用しますが、合理的に効率的で、可能な限り移植性があります。

まあ、エラー関数を除いて-それは私に特有です。エラーをきれいに処理する限り、問題ありません。 ＆quot; jlss.h＆quot; ヘッダーは、 fcopy（）を宣言します。 ＆quot; stderr.h＆quot; ヘッダーは、他の多くの同様のエラー報告関数の中で err_syserr（）を宣言します。関数の簡単なバージョンが続きます-実際のものはプログラム名を追加し、他のいくつかのことを行います。

<*>

上記のコードは、最新のBSDライセンスまたはGPL v3を選択したものとして扱われる場合があります。

各読み取りのサイズは512の倍数である必要があります（セクターサイズ）4096が適切です

これは非常に簡単で明確な例です。ファイルをコピー。特定の関数呼び出しなしでANSI-Cで書かれているので、これはかなり移植性があると思います。

ファイルをコピーするという意味によっては、決して些細なことではありません。コンテンツのみをコピーする場合は、ほとんど何もする必要はありません。しかし、一般的には、ファイルのメタデータをコピーする必要があり、それは確かにプラットフォームに依存しています。移植性のある方法で必要なことを行うCライブラリを知りません。移植性に関心がある場合、ファイル名を単独で処理することは簡単なことではありません。

C ++では、にファイルライブラリがあります。ブースト

独自のファイルコピーを実装するときに見つけたことが1つあります。それは明らかですが、そうではありません。I/ Oは遅いです。コピーの速度によって、コピーの速度を調整できます。そのため、できる限り少なくする必要があることは明らかです。

私が見つけた最良の結果は、自分自身に巨大なバッファを取得し、1つのI / Oでソースファイル全体を読み込み、1つのI / Oでバッファ全体を書き戻したときでした。 10バッチでそれをしなければならなかった場合、それはずっと遅くなりました。単純なコーダーが最初に試みるように、各バイトを読み書きしようとするのは苦痛でした。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow