Read txt file in C#

https://stackoverflow.com/questions/7981503

19-02-2021
|

Question

I have a txt file with the following data

(0010,0010) : Patient's Name                : LANE^LOIS^^^

(0010,0020) : Patient ID                    : AM-0053

(0010,0030) : Patient's Birth Date          : 4/15/1982

(0010,0040) : Patient's Sex                 : F

I have to read the content line by line and create a data table with the following details Patient's Name,Patient ID,Patient's Birth Date,Patient's Sex. The constants (eg (0010,0010)) will not be changed. It represent Patient's Name. Could you please give me the logic behind the task. I have this much,

Read line by line

Fetch first 11 chars and check if it is (0010,0010)

Go to end of line, or split the line by : and take the second element of the array.

Am I think well? Or how can I improve the performance ?

Solution

this little method should solve most problems. :) It loops over the lines (you'll have to adjust the loop and replace it with the textreader)

Puts everything in a list of patients.

void Main()
{
    var input = @"(0010,0010) : Patient's Name                : LANE^LOIS^^^
    (0010,0020) : Patient ID                    : AM-0053
    (0010,0030) : Patient's Birth Date          : 4/15/1982
    (0010,0040) : Patient's Sex                 : F
    (0010,0010) : Patient's Name                : LANE^LOIS^^^
    (0010,0020) : Patient ID                    : AM-0053
    (0010,0030) : Patient's Birth Date          : 4/15/1982
    (0010,0040) : Patient's Sex                 : F
    (0010,0010) : Patient's Name                : LANE^LOIS^^^
    (0010,0020) : Patient ID                    : AM-0053
    (0010,0030) : Patient's Birth Date          : 4/15/1982
    (0010,0040) : Patient's Sex                 : F";
    List<Patient> patients = new List<Patient>();

    Patient p = null;
    foreach(var line in input.Split(new[] {'\n'}))
    {
        var value = line.Split(new[] { ':' }, StringSplitOptions.RemoveEmptyEntries).Last().Trim();
        if(line.Trim().StartsWith("(0010,0010)"))
        {
            if(p != null)
                patients.Add(p);
            p = new Patient();
            p.Name = value;
        }
        else if(line.Trim().StartsWith("(0010,0020)"))
        {
            p.ID = value;
        }
        else if(line.Trim().StartsWith("(0010,0030)"))
        {
            DateTime birthDate;
            if(DateTime.TryParse(value, out birthDate))
                p.BirthDate = birthDate;
        }
        else if(line.Trim().StartsWith("(0010,0040)"))
        {
            p.Sex = value.ToCharArray()[0]; 
        }
    }
    if(p != null)
        patients.Add(p);
}

public class Patient
{
    public string Name { get; set; }
    public string ID { get; set; }
    public DateTime? BirthDate { get; set; }
    public char Sex { get; set; }
}

OTHER TIPS

Your approach sounds sensible. Splitting by ":" looks like a reasonable idea.

String handling of this kind will be very quick -- and much quicker than writing the resulting data record to disk or a database, so efficiency probably shouldn't be a concern.

Don't worry about performance until you know there's an issue, but in general if you can avoid excess memory allocations it's to your advantage. Thus if all you need is the last part, you can use StartsWith() on string so you don't have to create a substring which will later be garbage collected, then you can use LastIndexOf() to find the start of the last part and just substring the remainder.

while((line = Console.ReadLine()) != null)
{
    if (line.StartsWith("0010,0010"))
    {
        var pos = line.LastIndexOf(':');

        if (pos != -1)
        {
            // do whatever with part
            var part = line.SubString(pos+1).Trim();
        }    
    }
}

You can check your line even before splitting it if it contains (0010,0010) or (4 digigts followed by , and another 4 digits). If detected, you can split into an array of strings, trim spaces and populate your table row. You can use following expression to find (0010,0010)

Regex.IsMatch("line string here...", "[(]{1}[0-9]{4},{1}[0-9]{4}[)]{1}") // should be true if found

That seems like a reasonable approach. I wouldn't worry about performance until it becomes a problem.

For argument's sake, let's assume you have 100,000 of these. Write some working code first, and use a System.Diagnostics.Stopwatch to time how long 100 takes. Find the longest-running part of the process, and attempt to shorten it. It might be (and I haven't tried) reading the file line by line. You could try reading the file in one go, and splitting it on the newline character. It might be better to run them in parallel using all the cores of your processor.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow