La lecture en ligne des fichiers texte en ligne, avec un décalage / rapport position exacte

StackOverflow https://stackoverflow.com/questions/2594125

  •  25-09-2019
  •  | 
  •  

Question

Mon exigence simple: Lecture d'un énorme (> un million) fichier de test en ligne (Pour cet exemple, supposons qu'il est un fichier CSV de quelques sortes) et en maintenant une référence au début de cette ligne pour plus rapide recherche dans l'avenir (lire une ligne , à partir de X).

J'ai essayé la manière naïve et facile d'abord, en utilisant un StreamWriter et d'accéder au BaseStream.Position sous-jacent. Malheureusement, cela ne fonctionne pas comme je voulais:

Étant donné un fichier contenant les éléments suivants

Foo
Bar
Baz
Bla
Fasel

et ce code très simple

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = sr.BaseStream.Position;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos = sr.BaseStream.Position;
  }
}

la sortie est la suivante:

000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel

Je peux imaginer que le flux est d'essayer d'être utile / efficace et lit probablement (gros) morceaux chaque fois que de nouvelles données est nécessaire. Pour moi, cela est mauvais ..

La question, enfin: un moyen d'obtenir le (octet, char) décalé lors de la lecture d'un fichier ligne par ligne sans l'aide d'un flux de base et de jouer avec \ r \ n \ r \ n et la chaîne de codage, etc. manuellement? Pas un gros problème, vraiment, je n'aime pas seulement de construire des choses qui pourraient exister déjà ..

Était-ce utile?

La solution

Vous pouvez créer une enveloppe TextReader, qui permettrait de suivre la position actuelle dans la base TextReader:

public class TrackingTextReader : TextReader
{
    private TextReader _baseReader;
    private int _position;

    public TrackingTextReader(TextReader baseReader)
    {
        _baseReader = baseReader;
    }

    public override int Read()
    {
        _position++;
        return _baseReader.Read();
    }

    public override int Peek()
    {
        return _baseReader.Peek();
    }

    public int Position
    {
        get { return _position; }
    }
}

Vous pouvez alors l'utiliser comme suit:

string text = @"Foo
Bar
Baz
Bla
Fasel";

using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
    string line;
    while ((line = trackingReader.ReadLine()) != null)
    {
        Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
    }
}

Autres conseils

Après la recherche, les tests et faire quelque chose de fou, il est mon code pour résoudre (je suis actuellement en utilisant ce code dans mon produit).

public sealed class TextFileReader : IDisposable
{

    FileStream _fileStream = null;
    BinaryReader _binReader = null;
    StreamReader _streamReader = null;
    List<string> _lines = null;
    long _length = -1;

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class with default encoding (UTF8).
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    public TextFileReader(string filePath) : this(filePath, Encoding.UTF8) { }

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class.
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    /// <param name="encoding">The encoding of text file.</param>
    public TextFileReader(string filePath, Encoding encoding)
    {
        if (!File.Exists(filePath))
            throw new FileNotFoundException("File (" + filePath + ") is not found.");

        _fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
        _length = _fileStream.Length;
        _binReader = new BinaryReader(_fileStream, encoding);
    }

    /// <summary>
    /// Reads a line of characters from the current stream at the current position and returns the data as a string.
    /// </summary>
    /// <returns>The next line from the input stream, or null if the end of the input stream is reached</returns>
    public string ReadLine()
    {
        if (_binReader.PeekChar() == -1)
            return null;

        string line = "";
        int nextChar = _binReader.Read();
        while (nextChar != -1)
        {
            char current = (char)nextChar;
            if (current.Equals('\n'))
                break;
            else if (current.Equals('\r'))
            {
                int pickChar = _binReader.PeekChar();
                if (pickChar != -1 && ((char)pickChar).Equals('\n'))
                    nextChar = _binReader.Read();
                break;
            }
            else
                line += current;
            nextChar = _binReader.Read();
        }
        return line;
    }

    /// <summary>
    /// Reads some lines of characters from the current stream at the current position and returns the data as a collection of string.
    /// </summary>
    /// <param name="totalLines">The total number of lines to read (set as 0 to read from current position to end of file).</param>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadLines(int totalLines)
    {
        if (totalLines < 1 && this.Position == 0)
            return this.ReadAllLines();

        _lines = new List<string>();
        int counter = 0;
        string line = this.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            counter++;
            if (totalLines > 0 && counter >= totalLines)
                break;
            line = this.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Reads all lines of characters from the current stream (from the begin to end) and returns the data as a collection of string.
    /// </summary>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadAllLines()
    {
        if (_streamReader == null)
            _streamReader = new StreamReader(_fileStream);
        _streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
        _lines = new List<string>();
        string line = _streamReader.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            line = _streamReader.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Gets the length of text file (in bytes).
    /// </summary>
    public long Length
    {
        get { return _length; }
    }

    /// <summary>
    /// Gets or sets the current reading position.
    /// </summary>
    public long Position
    {
        get
        {
            if (_binReader == null)
                return -1;
            else
                return _binReader.BaseStream.Position;
        }
        set
        {
            if (_binReader == null)
                return;
            else if (value >= this.Length)
                this.SetPosition(this.Length);
            else
                this.SetPosition(value);
        }
    }

    void SetPosition(long position)
    {
        _binReader.BaseStream.Seek(position, SeekOrigin.Begin);
    }

    /// <summary>
    /// Gets the lines after reading.
    /// </summary>
    public List<string> Lines
    {
        get
        {
            return _lines;
        }
    }

    /// <summary>
    /// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
    /// </summary>
    public void Dispose()
    {
        if (_binReader != null)
            _binReader.Close();
        if (_streamReader != null)
        {
            _streamReader.Close();
            _streamReader.Dispose();
        }
        if (_fileStream != null)
        {
            _fileStream.Close();
            _fileStream.Dispose();
        }
    }

    ~TextFileReader()
    {
        this.Dispose();
    }
}

La solution Bien que Thomas Levesque fonctionne bien, voici le mien. Il utilise la réflexion de sorte qu'il sera plus lent, mais il est un codage indépendant. De plus, j'ajouté Seek trop l'extension.

/// <summary>Useful <see cref="StreamReader"/> extentions.</summary>
public static class StreamReaderExtentions
{
    /// <summary>Gets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks><para>This method is quite slow. It uses reflection to access private <see cref="StreamReader"/> fields. Don't use it too often.</para></remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <returns>The current position of this stream.</returns>
    public static long GetPosition(this StreamReader streamReader)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        var charBuffer = (char[])streamReader.GetType().InvokeMember("charBuffer", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charPos = (int)streamReader.GetType().InvokeMember("charPos", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charLen = (int)streamReader.GetType().InvokeMember("charLen", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);

        var offsetLength = streamReader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);

        return streamReader.BaseStream.Position - offsetLength;
    }

    /// <summary>Sets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks>
    /// <para><see cref="StreamReader.BaseStream"/> should be seekable.</para>
    /// <para>This method is quite slow. It uses reflection and flushes the charBuffer of the <see cref="StreamReader.BaseStream"/>. Don't use it too often.</para>
    /// </remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <param name="position">The point relative to origin from which to begin seeking.</param>
    /// <param name="origin">Specifies the beginning, the end, or the current position as a reference point for origin, using a value of type <see cref="SeekOrigin"/>. </param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <exception cref="ArgumentException">Occurs when <see cref="StreamReader.BaseStream"/> is not seekable.</exception>
    /// <returns>The new position in the stream. This position can be different to the <see cref="position"/> because of the preamble.</returns>
    public static long Seek(this StreamReader streamReader, long position, SeekOrigin origin)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        if (!streamReader.BaseStream.CanSeek)
            throw new ArgumentException("Underlying stream should be seekable.", "streamReader");

        var preamble = (byte[])streamReader.GetType().InvokeMember("_preamble", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        if (preamble.Length > 0 && position < preamble.Length) // preamble or BOM must be skipped
            position += preamble.Length;

        var newPosition = streamReader.BaseStream.Seek(position, origin); // seek
        streamReader.DiscardBufferedData(); // this updates the buffer

        return newPosition;
    }
}

Ceci est vraiment question difficile. Après très longue et épuisante énumération des différentes solutions Internet (y compris des solutions de ce fil, je vous remercie!) Je devais créer mon propre vélo.

J'avais les exigences suivantes:

  • Performance - la lecture doit être très rapide, la lecture si un omble chevalier au moment ou en utilisant la réflexion ne sont pas acceptables, donc mise en mémoire tampon est nécessaire
  • streaming - le fichier peut être énorme, il est donc pas acceptable de le lire à la mémoire entièrement
  • Tailing - fichier tailing devrait être disponible
  • Les longues lignes - lignes peuvent être très longues, si le tampon ne peut pas être limité
  • Stable - erreur d'un seul octet a été immédiatement visible pendant l'utilisation. Malheureusement pour moi, plusieurs implémentations que j'ai trouvés avec des problèmes de stabilité

    public class OffsetStreamReader
    {
        private const int InitialBufferSize = 4096;    
        private readonly char _bom;
        private readonly byte _end;
        private readonly Encoding _encoding;
        private readonly Stream _stream;
        private readonly bool _tail;
    
        private byte[] _buffer;
        private int _processedInBuffer;
        private int _informationInBuffer;
    
        public OffsetStreamReader(Stream stream, bool tail)
        {
            _buffer = new byte[InitialBufferSize];
            _processedInBuffer = InitialBufferSize;
    
            if (stream == null || !stream.CanRead)
                throw new ArgumentException("stream");
    
            _stream = stream;
            _tail = tail;
            _encoding = Encoding.UTF8;
    
            _bom = '\uFEFF';
            _end = _encoding.GetBytes(new [] {'\n'})[0];
        }
    
        public long Offset { get; private set; }
    
        public string ReadLine()
        {
            // Underlying stream closed
            if (!_stream.CanRead)
                return null;
    
            // EOF
            if (_processedInBuffer == _informationInBuffer)
            {
                if (_tail)
                {
                    _processedInBuffer = _buffer.Length;
                    _informationInBuffer = 0;
                    ReadBuffer();
                }
    
                return null;
            }
    
            var lineEnd = Search(_buffer, _end, _processedInBuffer);
            var haveEnd = true;
    
            // File ended but no finalizing newline character
            if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
            {
                if (_tail)
                    return null;
                else
                {
                    lineEnd = _informationInBuffer;
                    haveEnd = false;
                }
            }
    
            // No end in current buffer
            if (!lineEnd.HasValue)
            {
                ReadBuffer();
                if (_informationInBuffer != 0)
                    return ReadLine();
    
                return null;
            }
    
            var arr = new byte[lineEnd.Value - _processedInBuffer];
            Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
    
            Offset = Offset + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
            _processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
    
            return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
        }
    
        private void ReadBuffer()
        {
            var notProcessedPartLength = _buffer.Length - _processedInBuffer;
    
            // Extend buffer to be able to fit whole line to the buffer
            // Was     [NOT_PROCESSED]
            // Become  [NOT_PROCESSED        ]
            if (notProcessedPartLength == _buffer.Length)
            {
                var extendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
                Array.Copy(_buffer, extendedBuffer, _buffer.Length);
                _buffer = extendedBuffer;
            }
    
            // Copy not processed information to the begining
            // Was    [PROCESSED NOT_PROCESSED]
            // Become [NOT_PROCESSED          ]
            Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
    
            // Read more information to the empty part of buffer
            // Was    [ NOT_PROCESSED                   ]
            // Become [ NOT_PROCESSED NEW_NOT_PROCESSED ]
            _informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
    
            _processedInBuffer = 0;
        }
    
        private int? Search(byte[] buffer, byte byteToSearch, int bufferOffset)
        {
            for (int i = bufferOffset; i < buffer.Length - 1; i++)
            {
                if (buffer[i] == byteToSearch)
                    return i;
            }
            return null;
        }
    }
    

Est-ce que ce travail:

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = 0;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos += line.Length;
  }
}
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top