Is there a better way to grep a word in a large file (the word can be in the begining or ending of the file)

StackOverflow https://stackoverflow.com/questions/18913219

  •  29-06-2022
  •  | 
  •  

Question

I am playing with a puzzle on codercharts.com.

I am using StreamReader readline and regular expression to grep the word.

pseudo code :

using (StreamReader r = new StreamReader(a[1]))
{
      bool found;
      while ((l = r.ReadLine()) != null){
          MatchCollection matches = Regex.Matches(l,String.Format(@"\b{0}\b",'YourWordHere'));
          found = matches.Count>0
          break;
      }
}

It works. But while dealing with a word near the end, the performance is not good.

Because it iterates from begin to end.

Any suggestion?

Was it helpful?

Solution

You can make the search much faster, and the overall runtime about twice as fast by creating the regex once and reusing it:

            var rx = new Regex("\bword\b", RegexOptions.Compiled);
            while ((l = r.ReadLine()) != null)
                if (rx.IsMatch(l))
                    break;

Otherwise, you will not gain much improvement, because most of the time is spent loading the file from disk (at least for me - you would see a bigger improvement with a faster drive). The results on a big file I created and put the sought word near the end:

Your code 1526 ms

My code: 762 ms

No code (empty while ((l = r.ReadLine()) != null) loop): 597 ms

As you can see, merely reading the file already takes almost 600 ms.

Now, if you could load the file once, keep it in memory, and just do the search when needed - then a regex should be reasonably quick (~100 ms in the above situation). If you are searching the same file many times, this would be a good idea.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top