Question

Im reading in a file(this file consists of one long string which is 2gb in length).

This is my function which read all contents of the file into memory and then splits the string and places: *reader = StreamReader

public List<char[]> GetAllContentAsList()
        {
            int bytesToRead = 1000000;
            char[] buffer = new char[bytesToRead];
            List<char[]> results = new List<char[]>();

            while (_reader.Read(buffer, 0, bytesToRead) != 0)
            {
                char[] temp = new char[bytesToRead];
                Array.Copy(buffer,temp,bytesToRead);
                results.Add(temp);
            }

            return results;
        }

When all data in placed into the List it takes up 4gb in RAM. How is this possible when the file is only 2gb in size?

*Edit

This is what i ended up doing. Im not converting the array of bytes to a string, im just passing the bytes on an manipulating them. This was the fiel is only 2Gb in mem instead of 4gb

 public List<byte[]> GetAllContentAsList()
            {
                int bytesToRead = 1000000;
                var buffer = new byte[bytesToRead];
                List<byte[]> results = new List<byte[]>();

                while (_reader.Read(buffer, 0, bytesToRead) != 0)
                {
                    //string temp = Encoding.UTF8.GetString(buffer);
                    byte[] b = new byte[bytesToRead];
                    Array.Copy(buffer,b,bytesToRead);
                    results.Add(b);
                }

                return results;
            }
Was it helpful?

Solution

Educated guess here:

The file is UTF-8 or ASCII encoded and only (mostly) contains singly byte wide characters (or possibly some other codepage that is mostly single byte wide).

Now, the .NET characters are UTF-16 which are all 2 (or more) bytes in length.

So, in memory the characters will be double the size.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top