Вопрос

I have a tab delimited txt file with 500K records. I'm using the code below to read data to dataset. With 50K it works fine but 500K it gives "Exception of type 'System.OutOfMemoryException' was thrown."

What is the more efficient way to read large tab delimited data? Or how to resolve this issue? Please give me an example

public DataSet DataToDataSet(string fullpath, string file)
{
    string sql = "SELECT * FROM " + file; // Read all the data
    OleDbConnection connection = new OleDbConnection // Connection
                  ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fullpath + ";"
                   + "Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
    OleDbDataAdapter ole = new OleDbDataAdapter(sql, connection); // Load the data into the adapter
    DataSet dataset = new DataSet(); // To hold the data
    ole.Fill(dataset); // Fill the dataset with the data from the adapter
    connection.Close(); // Close the connection
    connection.Dispose(); // Dispose of the connection
    ole.Dispose(); // Get rid of the adapter
    return dataset;
}
Это было полезно?

Решение

Use a stream approach with TextFieldParser - this way you will not load the whole file into memory in one go.

Другие советы

You really want to enumerate the source file and process each line at a time. I use the following

    public static IEnumerable<string> EnumerateLines(this FileInfo file)
    {
        using (var stream = File.Open(file.FullName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        using (var reader = new StreamReader(stream))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                yield return line;
            }
        }
    }

Then for each line you can split it using tabs and process each line at a time. This keeps memory down really low for the parsing, you only use memory if the application needs it.

Have you tried the TextReader?

  using (TextReader tr = File.OpenText(YourFile))
  {
      string strLine = string.Empty;
      string[] arrColumns = null;
      while ((strLine = tr.ReadLine()) != null)
      {
           arrColumns = strLine .Split('\t');
           // Start Fill Your DataSet or Whatever you wanna do with your data
      }
      tr.Close();
  }

I found FileHelpers

The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.

Maybe it can help.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top