Pregunta

Tengo que leer (scan) un archivo secuencial y procesar su contenido. Tamaño del archivo puede ser desde muy pequeña (algunos KB) a muy grandes (algunas GB).

I trató dos técnicas que utilizan VC10 / VS2010 en Windows 7 de 64 bits:

  1. memoria Win32 archivos (es decir CreateFile, CreateFileMapping, MapViewOfFile, etc.) Mapea
  2. fopen y fread del CRT.

pensé que la técnica de archivo de memoria asignada podría ser más rápido que las funciones CRT, pero algunas pruebas mostraron que la velocidad es casi la misma en ambos casos.

declaraciones La siguiente C ++ se utilizan para MMF:

HANDLE hFile = CreateFile(
    filename,
    GENERIC_READ,
    FILE_SHARE_READ,
    NULL,
    OPEN_EXISTING,
    FILE_FLAG_SEQUENTIAL_SCAN,
    NULL
    );

HANDLE hFileMapping = CreateFileMapping(
    hFile,
    NULL,
    PAGE_READONLY,
    0,
    0,
    NULL
    );

El archivo se lee de forma secuencial, pedazo a pedazo; cada trozo es SYSTEM_INFO.dwAllocationGranularity de tamaño.

Teniendo en cuenta que la velocidad es casi lo mismo con MMF y CRT, que haría uso de funciones CRT porque son más simples y multi-plataforma. Pero tengo curiosidad: estoy usando la técnica de MMF correctamente? ¿Es normal que el rendimiento MMF en este caso de forma secuencial scannig archivo es el mismo que el CRT uno?

Gracias.

¿Fue útil?

Solución

I believe you'll not see much difference if you access the file sequentially. Because file I/O is very heavily cached, + read-ahead is probably also used.

The thing would be different if you had many "jumps" during the file data processing. Then, each time setting a new file pointer and reading a new file portion will probably kill CRT, whereas MMF will give you the maximum possible performance

Otros consejos

Since you are scanning the file sequentially I would not expect disk usage pattern to be much different for either method.

For large files, MMF might reduce data locality and even result in a copy of all or part of the file being placed in the pagefile, whereas processing via CRT using a small buffer would all take place in RAM. In this instance, MMF would probably be slower. You can mitigate this by only mapping in part of the underlying file at a time, but then things get more complex without any likely win over direct sequential I/O.

MMF are really the way Windows implements inter-process shared memory, rather than a way to speed up generalized file I/O. The file manager cache in the kernel is what you really need to leverage here.

I thought that memory mapped file technique could be faster than CRT functions, but some tests showed that the speed is almost the same in both cases.

You are probably hitting the file system cache for your tests. Unless you explicitly create file handles to bypass the file system cache (FILE_FLAG_NO_BUFFERING when calling CreateFile), the file system cache will kick in and keep recently accessed files in memory.

There is a small speed difference between reading a file that is in the file system cache with buffering turned on, as the operating system has to perform an extra copy, as well as system call overhead. But for your purposes, you should probably stick with the CRT file functions.

Gustavo Duarte has a great article on memory mapped files (from a generic OS perspective).

Both methods will eventually come down to disk i/o, that will be your bottleneck. I would go with one method that my higher level functionality likes more - if i have need streaming, I'll go with files, if I need sequential access and fixed size files, I would consider memory mapped files.

Or, in case when you have an algorithm that works only on memory, then mem-mapped files can be easier way out.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top