What is the equivalent 'streams' code of TStringList.SaveToFile and which is better for large amounts of data?

StackOverflow https://stackoverflow.com/questions/14602238

  •  06-03-2022
  •  | 
  •  

Question

The following console application utilises TStringList.SaveToFile to write multiples lines to a text file:

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Classes;
var
  i: Integer;
  a,b,c: Single;
  myString : String;
  myStringList : TStringList;
begin
  try
    Randomize;
    myStringList := TStringList.Create; 
    for i := 0 to 1000000 do
    begin
      a := Random;
      b := Random;
      c := Random;
      myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
      myStringList.Add(myString);
    end;
    myStringList.SaveToFile('Output.txt');
    myStringList.Free;
    WriteLn('Done');
    Sleep(10000);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

It takes around 3 seconds to write a >50MB file with 1000001 lines and seems to work fine. However, many people advocate using streams for such processes. What would the stream equivalent be and what are the advantages/disadvantages of using it compared to TStringList.SaveToFile?

Was it helpful?

Solution

It may be faster to write directly to a stream. Or it may not. I suggest you try it out and time both options. Writing to a stream looks like this:

for i := 0 to 1000000 do
begin
  a := Random;
  b := Random;
  c := Random;
  myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + 
    FloatToStr(c) + sLineBreak;
  Stream.WriteBuffer(myString[1], Length(myString)*SizeOf(myString[1]));
end;

To have any hope of this version being fast, you need to use a buffered stream. Try this one: Buffered files (for faster disk access).

The code above will output UTF-16 text on modern Delphi. If you want to output ANSI text simply declare myString as AnsiString.

I'll let you do the timing, but my guess is that this variant performs similarly to the string list. I suspect that the time is spent calling Random and FloatToStr. I expect that the file saving with the string list is already very fast.

Putting speed to one side, there is another benefit of this approach. In the string list approach, as per the code in the question, the entire content of the text file is stored in memory. And when you save the file, another copy is made as part of the save procedure. So you will have two copies of the entire file in memory.

In contrast, when saving directly to a stream, the only memory requirement is whatever buffer your stream class uses. For a 50MB file as per the question there's likely no real problem with either approach. For a much larger file then you will run into out of memory errors if you try to hold the entire file in memory.


Personally though, I'd consider making use of the TStreamWriter class. This useful class separates the concerns of writing data (text, values etc.) from the concern of pushing to a stream. Your code would become:

Writer := TStreamWriter.Create(Stream);//use whatever stream you like
try
  for i := 0 to 1000000 do
  begin
    a := Random;
    b := Random;
    c := Random;
    Writer.WriteLine(FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) +
      FloatToStr(c));
  end;
finally
  Writer.Free;
end;

The TStreamWriter implements buffering with a 1KB buffer so you can use TFileStream and expect to get reasonable performance.


I would recommend that you choose the technique that leads to the most readable code. If performance becomes an issue you can optimise that later. My personal preference would be for TStreamWriter. This gives very clean and readable code, yet also excellent separation of content generation from streaming. The performance is perfectly reasonable also.

OTHER TIPS

A TFileStream based solution would look as follows, but there are some important points:

  • The TFileStream code is slower. There's no buffering in TFileStream and writing 20 bytes at a time to file is not effective. The TStringList bufferes everything in RAM and saves it all at once. That's optimum, but it uses a lot of RAM.
  • In the TStringList - based variant 50% of time is spent in Random, as expected actually.
  • For the TFileStream solution to become more effective you'd need to roll a buffering scheme so you'd write a reasonable amount to disk each time (example: 4Kb)

Code:

program Project9;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,
  Classes,
  DateUtils;
var
  i: Integer;
  a,b,c: Single;
  myString : AnsiString;
  StartTime: TDateTime;
  F: TFileStream;
begin
  try
    Randomize;
    StartTime := Now;
    F := TFileStream.Create('Output.txt', fmCreate);
    try
      for i := 0 to 1000000 do
      begin
        a := Random;
        b := Random;
        c := Random;
        myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
        myString := AnsiString(Format('%f'#9'%f'#9'%f'#13#10, [a, b, c]));
        F.WriteBuffer(myString[1], Length(myString));
      end;
    finally F.Free;
    end;
    WriteLn('Done. ', SecondOf(Now-StartTime), ':', MilliSecondOf(Now-StartTime));
    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top