Frage

I have a large string separated text file (not single-char seperated) like this:

first data[STRING-SEPERATOR]second data[STRING-SEPERATOR] ...

I don't want to load the entire file in the memory because of its size (~250MB). If I read the entire file with System.IO.File.ReadAllText i get an OutOfMemoryException.

Therefore I want to read the file until the first appereance of [STRING-SEPERATOR], then proceed with the next string. It's like to "take" the first data off the file, process it and the go on with the second data which is now the first data of the file.

The System.IO.StreamReader.ReadLine() doesn't help me because the contents of the file is one line.

Have you got an idea how to read a file until a certain string in .NET?

I hope for some ideas, thank you.

War es hilfreich?

Lösung 3

Thank you for your replies. Here's the function I wrote in VB.NET:

Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String
            Dim builder As New System.Text.StringBuilder()
            Dim returnTextBuilder As New System.Text.StringBuilder()
            Dim returnText As String = String.Empty
            Dim size As Integer = CInt(UntilText.Length / 2) - 1
            Dim buffer(size) As Byte
            Dim currentRead As Integer = -1

            Do Until currentRead = 0
                Dim collected As String = Nothing
                Dim chars As String = Nothing
                Dim foundIndex As Integer = -1

                currentRead = Stream.Read(buffer, 0, buffer.Length)
                chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead)

                builder.Append(chars)
                returnTextBuilder.Append(chars)

                collected = builder.ToString()
                foundIndex = collected.IndexOf(UntilText)

                If (foundIndex >= 0) Then
                    returnText = returnTextBuilder.ToString()

                    Dim indexOfSep As Integer = returnText.IndexOf(UntilText)
                    Dim cutLength As Integer = returnText.Length - indexOfSep

                    returnText = returnText.Remove(indexOfSep, cutLength)

                    builder.Remove(0, foundIndex + UntilText.Length)

                    If (cutLength > UntilText.Length) Then
                        Stream.Position = Stream.Position - (cutLength - UntilText.Length)
                    End If

                    Return returnText
                ElseIf (Not collected.Contains(UntilText.First())) Then
                    builder.Length = 0
                End If
            Loop

            Return String.Empty
    End Function

C#

public static string ReadUntil(System.IO.FileStream Stream, string UntilText)
{
    System.Text.StringBuilder builder = new System.Text.StringBuilder();
    System.Text.StringBuilder returnTextBuilder = new System.Text.StringBuilder();
    string returnText = string.Empty;
    int size = System.Convert.ToInt32(UntilText.Length / (double)2) - 1;
    byte[] buffer = new byte[size + 1];
    int currentRead = -1;

    while (currentRead != 0)
    {
        string collected = null;
        string chars = null;
        int foundIndex = -1;

        currentRead = Stream.Read(buffer, 0, buffer.Length);
        chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead);

        builder.Append(chars);
        returnTextBuilder.Append(chars);

        collected = builder.ToString();
        foundIndex = collected.IndexOf(UntilText);

        if ((foundIndex >= 0))
        {
            returnText = returnTextBuilder.ToString();

            int indexOfSep = returnText.IndexOf(UntilText);
            int cutLength = returnText.Length - indexOfSep;

            returnText = returnText.Remove(indexOfSep, cutLength);

            builder.Remove(0, foundIndex + UntilText.Length);

            if ((cutLength > UntilText.Length))
                Stream.Position = Stream.Position - (cutLength - UntilText.Length);

            return returnText;
        }
        else if ((!collected.Contains(UntilText.First())))
            builder.Length = 0;
    }

    return string.Empty;
}

Andere Tipps

This should help you.

private IEnumerable<string> ReadCharsByChunks(int chunkSize, string filePath)
{
    using (FileStream fs = new FileStream(filePath, FileMode.Open))
    {
        byte[] buffer = new byte[chunkSize]; 
        int currentRead;
        while ((currentRead = fs.Read(buffer, 0, chunkSize)) > 0)
        {
            yield return Encoding.Default.GetString(buffer, 0, currentRead);
        }
    }
}

private void SearchWord(string searchWord)
{
    StringBuilder builder = new StringBuilder();
    foreach (var chars in ReadCharsByChunks(2, "sample.txt"))//Can be any number
    {
        builder.Append(chars);

        var existing = builder.ToString();
        int foundIndex = -1;
        if ((foundIndex = existing.IndexOf(searchWord)) >= 0)
        {
            //Found
            MessageBox.Show("Found");

            builder.Remove(0, foundIndex + searchWord.Length);
        }
        else if (!existing.Contains(searchWord.First()))
        {
            builder.Clear();
        }
    }
}

StreamReader.Read has some overloads that might help you. Try this:

int index, count;
index = 0;
count = 200; // or whatever number you think is better
char[] buffer = new char[count];
System.IO.StreamReader sr = new System.IO.StreamReader("Path here");
while (sr.Read(buffer, index, count) > 0) { 
    /*
    check if buffer contains your string seperator, or at least some part of it 
    if it contains a part of it, you need check the rest of the stream to make sure it's a real seporator
    do your stuff, set the index to one character after the last seporator.
    */
}

A text file can also be read character-wise, as describe in this questions. For searching for a certain string, you would have to use some manually implemented logic which can search the desired string based on a character-wise input, which can be done with a state machine.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top