سؤال

I have a csv file with million records which consists of an ID column,date column etc.. I have u read each record from input file say file1 n put in a List,while inserting i need to check if that particular ID exists already in the List if yes then replace the date with new date from file1.

This can be done through binary search (List.BinarySearch)(C#).but since the file size is large i think it will be tedious process.What is the alternative.How can i do this in efficient way

Thanks

هل كانت مفيدة؟

المحلول 2

You may maintain a Dictionary<TKey, TValue> where TKey would be the type of the values stored in ID column and TValue would be the type of a particular record (You may define a class that would contain the fields, one for each column in the csv).

Then just assign the new record to the dictionary against the ID key to which it belongs. In this way, you can ensure that all the keys have updated records (also DateTime inside of these) in the dictionary. It is also time-efficient, indeed.

Here is an example code:

public class Record // this class represents a particular record in the csv
{
    public int ID { get; set; }
    public DateTime DateTime { get; set; }
    // other columns like above
}

then in the client code:

Dictionary<int, Record> dictionaryIdRecord = new Dictionary<int, Record>();

// `records` is the List of <Record>s in the csv
foreach (Record record in records)
{
    dictionaryIdRecord[record.ID] = record;
}

نصائح أخرى

Add them to a HashSet<T>.

A HashSet does efficient searching and overwrites duplicates, just what you want.
You will need to manage the Equality of your items.

Did you consider importing this list into a database table and perform said filtering through SQL queries ? From where I see it, it could the query wouldn't be much complicated, grouping on all fields and selecting max(yourdate) would certainly be a very good start to it. But I don't know if database is an option for this task ?

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top