Question

There's a directory with a few text files. How do I count the frequency of each word in each file? A word means a set of characters that can contain the letters, the digits and the underlining characters.

Was it helpful?

Solution

Here is a solution that should count all the word frequencies in a file:

    private void countWordsInFile(string file, Dictionary<string, int> words)
    {
        var content = File.ReadAllText(file);

        var wordPattern = new Regex(@"\w+");

        foreach (Match match in wordPattern.Matches(content))
        {
            int currentCount=0;
            words.TryGetValue(match.Value, out currentCount);

            currentCount++;
            words[match.Value] = currentCount;
        }
    }

You can call this code like this:

        var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

        countWordsInFile("file1.txt", words);

After this words will contain all words in the file with their frequency (e.g. words["test"] returns the number of times that "test" is in the file content. If you need to accumulate the results from more than one file, simply call the method for all files with the same dictionary. If you need separate results for each file then create a new dictionary each time and use a structure like @DarkGray suggested.

OTHER TIPS

There is a Linq-ish alternative which imo is simpler. The key here is to use the framework built in File.ReadLines (which is lazily read which is cool) and string.Split.

private Dictionary<string, int> GetWordFrequency(string file)
{
    return File.ReadLines(file)
               .SelectMany(x => x.Split())
               .Where(x => x != string.Empty)
               .GroupBy(x => x)
               .ToDictionary(x => x.Key, x => x.Count());
}

To get frequencies from many files, you can have an overload based on params.

private Dictionary<string, int> GetWordFrequency(params string[] files)
{
    return files.SelectMany(x => File.ReadLines(x))
                .SelectMany(x => x.Split())
                .Where(x => x != string.Empty)
                .GroupBy(x => x)
                .ToDictionary(x => x.Key, x => x.Count());
}

Word counting:

int WordCount(string text)
{
  var regex = new System.Text.RegularExpressions.Regex(@"\w+");

  var matches = regex.Matches(text);
  return matches.Count;     
}

Read text from file:

string text = File.ReadAllText(filename);

Word counting structure:

class FileWordInfo
{
  public Dictionary<string, int> WordCounts = new Dictionary<string, int>();
}

List<FileWordInfo> fileInfos = new List<FileWordInfo>();

@aKzenT answer is good, but has a problem! his code never checks if the word is already exists in the dictionary or not! so I modified the code as following:

private void countWordsInFile(string file, Dictionary<string, int> words)
{
    var content = File.ReadAllText(file);

    var wordPattern = new Regex(@"\w+");

    foreach (Match match in wordPattern.Matches(content))
    {
        if (!words.ContainsKey(match.Value))
            words.Add(match.Value, 1);
        else
            words[match.Value]++;
    }
}
string input= File.ReadAllText(filename);
var arr = input.Split(' ');
// finding frequencies of words in a string
IDictionary<string, int> dict = new Dictionary<string, int>();
foreach (var item in arr)
{
    var count = 0;
    if (dict.TryGetValue(item, out count))
        dict[item] = ++a;
    else
        dict.Add(item, 1);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top