큰 탭 구분 된 txt 파일을 읽는 효율적인 방법?
-
15-11-2019 - |
문제
500K 레코드가있는 탭 구분 된 TXT 파일이 있습니다.아래 코드를 사용하여 데이터 세트로 데이터를 읽고 있습니다.50K는 잘 작동하지만 500K는 "Type 'system.outOfMemoryException'이 Throw되었습니다."
큰 탭 구분 된 데이터를 읽는 것이 더 효율적인 방법은 무엇입니까? 또는이 문제를 해결하는 방법은 무엇입니까?나에게 예제 를주세요
public DataSet DataToDataSet(string fullpath, string file)
{
string sql = "SELECT * FROM " + file; // Read all the data
OleDbConnection connection = new OleDbConnection // Connection
("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fullpath + ";"
+ "Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
OleDbDataAdapter ole = new OleDbDataAdapter(sql, connection); // Load the data into the adapter
DataSet dataset = new DataSet(); // To hold the data
ole.Fill(dataset); // Fill the dataset with the data from the adapter
connection.Close(); // Close the connection
connection.Dispose(); // Dispose of the connection
ole.Dispose(); // Get rid of the adapter
return dataset;
}
. 해결책
Use a stream approach with TextFieldParser
- this way you will not load the whole file into memory in one go.
다른 팁
You really want to enumerate the source file and process each line at a time. I use the following
public static IEnumerable<string> EnumerateLines(this FileInfo file)
{
using (var stream = File.Open(file.FullName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Then for each line you can split it using tabs and process each line at a time. This keeps memory down really low for the parsing, you only use memory if the application needs it.
Have you tried the TextReader?
using (TextReader tr = File.OpenText(YourFile))
{
string strLine = string.Empty;
string[] arrColumns = null;
while ((strLine = tr.ReadLine()) != null)
{
arrColumns = strLine .Split('\t');
// Start Fill Your DataSet or Whatever you wanna do with your data
}
tr.Close();
}
I found FileHelpers
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
Maybe it can help.