Question

I am trying to read in a table from an Access database, and then sort the data in that table out to a number of text files. The clincher is that the filename to write to is dependant upon values in each record. This is officially my first C# application so you can consider me "green". I should also mention that I'm working off an Access database just until I can get the code hammered out, ultimately it'll be pulling from a SQL server with millions of records.

I have code working now, but the problem is that there are a ton of File Open / Close operations. I want to only open each file once for writing since it will be writing these files to a network drive. This is essentially a glue app running on a server - so there are some other restrictions too - I can't save to a local drive then copy to the network. I can't sort the query prior to pulling. I can't adversely affect server resources while running.

Probably the best way to do this is with a Hash table. Check if the file has been opened, if not, open it and save the file handle in the Hash Table. Then close them all at once when finished. However I cannot find an example of how to use multiple StreamWriter objects simultaneously.

I expected to find the answer to this relatively easily but I can't seem to find a solution tot his. My suspicion is that StreamWriter is the wrong class to be using for this.

The closest previous question I've been able to find is from a CodeProject page. On that page they say that the practice of keeping file hands open is bad and should be avoided, but the page doesn't explain why nor offer example alternatives. There is a suggestion to load the entire data set into memory and then operate on it, but that's not an option for me as there will be too much data in the tables.

Here's what I have so far.

String strConnection;
String strQuery;
String strPunchFileNameTemplate;

// Define our Variables
strConnection = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=ClockData.accdb";
strQuery = @"SELECT * FROM ClockPunches";   
strPunchFileNameTemplate = @"C:\PUNCHES\%READER%.TXT";      

// OleDbConnection implements iDisposable interface, so we must scope out its usage.
// Set up Connection to our data source
using (OleDbConnection ConnObj = new OleDbConnection(strConnection))    {

    // Create a Command with our Query String
    OleDbCommand CmdObj = new OleDbCommand(strQuery,ConnObj);

    // Open our Connection
    ConnObj.Open();

    // OledbDataReader implements iDisposable interface, so we must scope out its usage.
    // Execute our Reader
    using (OleDbDataReader ReaderObj = CmdObj.ExecuteReader(CommandBehavior.KeyInfo))   {

        // Load the source table's schema into memory (a DataTable object)
        DataTable TableObj = ReaderObj.GetSchemaTable();

        // Parse through each record in the Reader Object
        while(ReaderObj.Read()) {

            // Extract PunchTime, CardNumber, and Device to separate variables
            DateTime dtTime = ReaderObj.GetDateTime(ReaderObj.GetOrdinal("PunchTime"));
            Int16 intID = ReaderObj.GetInt16(ReaderObj.GetOrdinal("CardNumber"));
            String strReader = ReaderObj.GetString(ReaderObj.GetOrdinal("Device"));

            // Translate the device name into a designated filename (external function)
            strReader = GetDeviceFileName(strReader);

            // Put our dynamic filename into the path template
            String pathStr = strPunchFileNameTemplate.Replace("%READER%",strReader);

            // Check to see if the file exists.  New files need an import Header
            Boolean FileExistedBool = File.Exists(pathStr);

            // StreamWrite implements iDisposable interface, so we must scope out its usage.
            // Create a Text File for each Device, Append if it exists
            using (StreamWriter outSR = new StreamWriter(pathStr, true))    {

                // Write our Header if required
                if (FileExistedBool == false)   {
                    outSR.WriteLine("EXAMPLE FILE HEADER");
                }

                // Set up our string we wish to write to the file
                String outputStr = dtTime.ToString("MM-dd-yyyy HH:mm:ss") + " " + intID.ToString("000000");

                // Write the String
                outSR.WriteLine(outputStr);

                // End of StreamWriter Scope - should automatically close
            }
        }
        // End of OleDbDataREader Scope - should automatically close
    }
    // End of OleDbConnection Scope - should automatically close
}
Was it helpful?

Solution

It is a quite interesting problem you got your self into.

The problem with caching the file handlers is that a huge amount of file handlers can drain the system for resources making the program and windows perform bad.

If the number of devices in your database is not too high (less than 100) i think it would be safe to cache the handles.

Alternatively you could cache a million records distribute them to the different devices and save some and then read some more records.

You could place the records in a Dictionary like this:

class PunchInfo
{  
    public PunchInfo(DateTime time, int id)
    {
        Id = id;
        Time = time;
    }
    public DateTime Time;
    public int Id;
}

Dictionary<string, List<PunchInfo>> Devices;
int Count = 0;
const int Limit = 1000000;
const int LowerLimit = 90 * Limit / 100;
void SaveRecord(string device, int id, DateTime time)
{
   PunchInfo info = new PunchInfo(time, id);
   List<PunchInfo> list;
   if (!Devices.TryGetValue(device, out list))
   {
      list = new List<PunchInfo>();
      Devices.Add(device, list);
   }
   list.Add(info);
   Count++;
   if (Count >= Limit)
   {
       List<string> writeDevices = new List<string>();
       foreach(KeyValuePair<string, List<PunchInfo>> item in Devices)
       {
           writeDevices.Add(item.Key);
           Count -= item.Value.Count;
           if (Count < LowerLimit) break;
       }

       foreach(string device in writeDevices)
       {
          List<PunchInfo> list = Devices[device];
          Devices.Remove(device);
          SaveDevices(device, list);
       }
    }
}

void SaveAllDevices()
{
    foreach(KeyValuePair<string, List<PunchInfo>> item in Devices)
        SaveDevices(item.Key, item.Value);
    Devices.Clear();
}

This way you will avoid opening and closing files and have a lot of open files.

One million records takes up 20 MB memory, you could easily raise that to 10 million records without problems.

OTHER TIPS

You need to set up an array of writers. This is an example of how to do it.

namespace example
{
    class Program
    {
    public static StreamWriter[] writer = new StreamWriter[3];

    static void Main(string[] args)
    {
        writer[0] = new StreamWriter("YourFile1.txt");
        writer[1] = new StreamWriter("YourFile2.txt");
        writer[2] = new StreamWriter("YourFile3.txt");

        writer[0].WriteLine("Line in YourFile1.");
        writer[1].WriteLine("Line in YourFile2.");
        writer[2].WriteLine("Line in YourFile3.");

        writer[0].Close();
        writer[1].Close();
        writer[2].Close();
    }
}

}

If a single process has 100 or 1000 file handles open over a longer period this is generally considered questionable. But times changed, and this is no issue anymore. So if the situation asks for it, do it.

I was in the position to keep 100, 1000 or even 5000 files open in a process that analysis data in these files. And this would last even hours. I measured on a Windows OS if file reading/writing performance would go down. And this is not at all the case. Due to the now available memory resources of modern machines, having the 5000 file descriptors in memory on OS side does not cause any issues anymore. The OS keeps them sorted (I guess) and the look up of these descriptors is log(n) then, so nothing measurable happens.

Having these handles (file descriptor structs) open is certainly way better then filling memory up with data and then flushing it to disk, file by file.

I can suggest to keep your data in memory and write to disk only when you reach a certain threesold

const int MAX_MEMORY_BUFFER = 100000; // To be defined according to you memory limits
String strConnection;
String strQuery;
String strPunchFileNameTemplate;

strConnection = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=ClockData.accdb";
strQuery = @"SELECT * FROM ClockPunches";   
strPunchFileNameTemplate = @"C:\PUNCHES\%READER%.TXT";      

Dictionary<string, StringBuilder> data = new Dictionary<string, StringBuilder>();

using (OleDbConnection ConnObj = new OleDbConnection(strConnection))    
{
    OleDbCommand CmdObj = new OleDbCommand(strQuery,ConnObj);
    ConnObj.Open();

    using (OleDbDataReader ReaderObj = CmdObj.ExecuteReader(CommandBehavior.KeyInfo))   
    {
        while(ReaderObj.Read()) 
        {
            DateTime dtTime = ReaderObj.GetDateTime(ReaderObj.GetOrdinal("PunchTime"));
            Int16 intID = ReaderObj.GetInt16(ReaderObj.GetOrdinal("CardNumber"));
            String strReader = ReaderObj.GetString(ReaderObj.GetOrdinal("Device"));

            strReader = GetDeviceFileName(strReader);

            bool dataPresent = data.ContainsKey(strReader);
            if (dataPresent == false)   
            {
                StringBuilder sb = new StringBuilder("EXAMPLE FILE HEADER\r\n");
                data.Add(strReader, sb);
            }

            String outputStr = dtTime.ToString("MM-dd-yyyy HH:mm:ss") + " " + intID.ToString("000000");
            StringBuilder sb = data[strReader];
            sb.AppendLine(outputStr);
            if(sb.Length > MAX_MEMORY_BUFFER)
            {
                String pathStr = strPunchFileNameTemplate.Replace("%READER%",strReader);
                using(StreamWriter sw = new StremWriter(pathStr, true) // Append mode
                {
                    // Write the buffer and set the lenght to zero
                    sw.WriteLine(sb.ToString());
                    sb.Length = 0;
                }
            }
        }
    }

    // Write all the data remaining in memory
    foreach(KeyValuePair<string, StringBuilder> info in data)
    {
        if(info.Value.Length > 0)
        {
          String pathStr = strPunchFileNameTemplate.Replace("%READER%",info.Key);
          using(StreamWriter sw = new StremWriter(pathStr, true) // Append mode
          {
              sw.WriteLine(info.Value.ToString());
          }
        }
    }
}

This code need to be tested, but I wish to give you the general idea. In this way you can balance your IO operations. Lowering by increasing the memory buffer and viceversa. Of course now you need to consider also the memory available to store your data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top