문제

I have a file with space-separated numbers. It's size is about 1Gb and I want to get the numbers from it. I've decided to use Memory Mapped Files to read fast, but i don't understand how to do it. I tried to do next:

var mmf = MemoryMappedFile.CreateFromFile("test", FileMode.Open, "myFile");
var mmfa = mmf.CreateViewAccessor(0, 0, MemoryMappedFileAccess.Read);
var nums = new int[6];
var a = mmfa.ReadArray<int>(0, nums, 0, 6); 

But if "test" contains just "01" in num[0] I get 12337. 12337 = 48*256+49. I've searched in the internet but didn't find anything about my question. only about byte arrays or interprocess communication. Can you show me how to get 1 in num[0]?

도움이 되었습니까?

해결책

The following example will read from ASCII integers from a memory mapped file in the fastest way possible without creating any strings. The solution provided by MiMo is much slower. It does run at 5 MB/s which will not help you much. The biggest issue of the MiMo solution is that it does call a method (Read) for every char which costs a whooping factor 15 of performance. I wonder why you accepted his solution if your original issue was that you had a performance issue. You can get 20 MB/s with a dumb string reader and parsing the string into an integer. To get every byte via a method call does ruin your possible read performance.

The code below does map the file in 200 MB chunks to prevent filling up the 32 bit address space. Then it does scan through the buffer with an byte pointer which is very fast. The integer parsing is easy if you do not take localization into account. What is interesting that if I do create a View of the mapping that the only way to get a pointer to the view buffer does not allow me to start at the mapped region.

I would consider this a bug in the .NET Framwork which is still not fixed in .NET 4.5. The SafeMemoryMappedViewHandle buffer is allocated with the allocation granularity of the OS. If you advance to some offset you get a pointer back which does still point to the start of the buffer. This is really unfortunate because this makes the difference between 5MB/s and 77MB/s in parsing performance.

Did read 258.888.890 bytes with 77 MB/s


using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Runtime.InteropServices;

unsafe class Program
{
    static void Main(string[] args)
    {
        new Program().Start();
    }

    private void Start()
    {
        var sw = Stopwatch.StartNew();
        string fileName = @"C:\Source\BigFile.txt";//@"C:\Source\Numbers.txt";
        var file = MemoryMappedFile.CreateFromFile(fileName);
        var fileSize = new FileInfo(fileName).Length;
        int viewSize = 200 * 100 * 1000;
        long offset = 0;
        for (; offset < fileSize-viewSize; offset +=viewSize ) // create 200 MB views
        {
            using (var accessor = file.CreateViewAccessor(offset, viewSize))
            {
                int unReadBytes = ReadData(accessor, offset);
                offset -= unReadBytes;
            }
        }

        using (var rest = file.CreateViewAccessor(offset, fileSize - offset))
        {
            ReadData(rest, offset);
        }
        sw.Stop();
        Console.WriteLine("Did read {0:N0} bytes with {1:F0} MB/s", fileSize, (fileSize / (1024 * 1024)) / sw.Elapsed.TotalSeconds);
    }


    List<int> Data = new List<int>();

    private int ReadData(MemoryMappedViewAccessor accessor, long offset)
    {
        using(var safeViewHandle = accessor.SafeMemoryMappedViewHandle)
        {
            byte* pStart = null;
            safeViewHandle.AcquirePointer(ref pStart);
            ulong correction = 0;
            // needed to correct offset because the view handle does not start at the offset specified in the CreateAccessor call
            // This makes AquirePointer nearly useless.
            // http://connect.microsoft.com/VisualStudio/feedback/details/537635/no-way-to-determine-internal-offset-used-by-memorymappedviewaccessor-makes-safememorymappedviewhandle-property-unusable
            pStart = Helper.Pointer(pStart, offset, out correction);
            var len = safeViewHandle.ByteLength - correction;
            bool digitFound = false;
            int curInt = 0;
            byte current =0;
            for (ulong i = 0; i < len; i++)
            {
                current = *(pStart + i);
                if (current == (byte)' ' && digitFound)
                {
                    Data.Add(curInt);
                  //  Console.WriteLine("Add {0}", curInt);
                    digitFound = false;
                    curInt = 0;
                }
                else
                {
                    curInt = curInt * 10 + (current - '0');
                    digitFound = true;
                }
            }

            // scan backwards to find partial read number
            int unread = 0;
            if (curInt != 0 && digitFound)
            {
                byte* pEnd = pStart + len;
                while (true)
                {
                    pEnd--;
                    if (*pEnd == (byte)' ' || pEnd == pStart)
                    {
                        break;
                    }
                    unread++;

                }
            }

            safeViewHandle.ReleasePointer();
            return unread;
        }
    }

    public unsafe static class Helper
    {
        static SYSTEM_INFO info;

        static Helper()
        {
            GetSystemInfo(ref info);
        }

        public static byte* Pointer(byte *pByte, long offset, out ulong diff)
        {
            var num = offset % info.dwAllocationGranularity;
            diff = (ulong)num; // return difference

            byte* tmp_ptr = pByte;

            tmp_ptr += num;

            return tmp_ptr;
        }

        [DllImport("kernel32.dll", SetLastError = true)]
        internal static extern void GetSystemInfo(ref SYSTEM_INFO lpSystemInfo);

        internal struct SYSTEM_INFO
        {
            internal int dwOemId;
            internal int dwPageSize;
            internal IntPtr lpMinimumApplicationAddress;
            internal IntPtr lpMaximumApplicationAddress;
            internal IntPtr dwActiveProcessorMask;
            internal int dwNumberOfProcessors;
            internal int dwProcessorType;
            internal int dwAllocationGranularity;
            internal short wProcessorLevel;
            internal short wProcessorRevision;
        }
    }

    void GenerateNumbers()
    {
        using (var file = File.CreateText(@"C:\Source\BigFile.txt"))
        {
            for (int i = 0; i < 30 * 1000 * 1000; i++)
            {
                file.Write(i.ToString() + " ");
            }
        }
    }

}

다른 팁

You need to parse the file content, converting the characters into numbers - something like this:

List<int> nums = new List<int>();
long curPos = 0;
int curV = 0;
bool hasCurV = false;
while (curPos < mmfa.Capacity) {
  byte c;
  mmfa.Read(curPos++, out c);
  if (c == 0) {
    break;
  }
  if (c == 32) {
    if (hasCurV) {
      nums.Add(curV);
      curV = 0;
    }
    hasCurV = false;
  } else {
    curV = checked(curV*10 + (int)(c-48));
    hasCurV = true;
  }
}
if (hasCurV) {
  nums.Add(curV);
}

assuming that mmfa.Capacity is the total number of characters to read, and that the file contains only digits separated by space (i.e. no end lines or other white spaces)

48 = 0x30 = '0', 49 = 0x31 = '1'

So you get really your characters, they are just ASCII-encoded.

The string "01" takes 2 bytes, which fit into one int, so you get them both served in one int. If you want to get them separately, you need to ask for array of bytes.


Edit: in case when "01" needs to be parsed into a constant 1, i.e., from ASCII representation into binary, you need to go other way. I would suggest

  1. do not use memory mapped file,
  2. read a file with StreamReader line by line (see example here)
  3. split each line to chunks using string.Split
  4. parse each chunk into number using string.Parse
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top