Frage

EDIT: @Everyone Sorry, I feel silly getting mixed up with the size of int32. Question could be closed, but since there are several answers already, I selected the first one. Original question is below for reference


I am looking for a way to load a specific line from very large textfiles and I was planning on using File.ReadLines and the Skip() method:

File.ReadLines(fileName).Skip(nbLines).Take(1).ToArray();

Problem is, Skip() takes an int value, and int values are limited to 2 million or so. Should be fine for most files, but what if the file contains, say 20 million lines? I tried using a long, but no overload of Skip() accepts longs.

Lines are of variable, unknown length so I can't count the bytes.

Is there an option that doesn't involve reading line by line or splitting the file in chunks? This operation must be very fast.

War es hilfreich?

Lösung

Integers are 32-bit numbers, and so are limited to 2 billion or so.

That said, if you have to read a random line from the file, and all you know is that the file has lines, you will have to read it line by line until you reach the line you want. You can use some buffers to ease up on the I/O a little bit (they're on by default), but you won't get any better performance than that.

Unless you change the way the file is saved. If you could create an index file, containing the position of each line the main file, you can make reading a line infinitely faster.

Well, not infinitely, a but a lot faster - from O(N) to almost O(1) (almost, because seeking to a random byte in a file may not be an O(1) operation, depending on how the OS does it).

Andere Tipps

I voted to close your question because your premises are incorrect. However, were this a real problem, there's nothing to stop you writing your own Skip extension method that takes a long instead of an int:

public static class SkipEx
{
    public static IEnumerable<T> LongSkip<T>(this IEnumerable<T> src, 
                                             long numToSkip)
    {
        long counter = 0L;
        foreach(var item in src)
        {
            if(counter++ < numToSkip)continue;
            yield return item;
        }
    }
}

so now you can do such craziness as

File.GetLines(filename).LongSkip(100000000000L)

without problems (and come back next year...). Tada!

Int values are limited to around 2 billion not two million. So unless your file is going to have more than around 2.4 billion lines, you should be fine.

You always can use SkipWhile and TakeWhile, and write your own predicates

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top