Count same string count out of massive string list
-
15-04-2021 - |
Вопрос
I have got over 600k lines of string. I want to group same strings and learn their counts.
So example
i go to school
i like music
i like games
i like music
i like music
i like games
i like music
So result will be
i go to school , 1
i like games , 2
i like music , 4
How can I do that with the fastest possible way?
Решение
The GroupBy
method is what you want. You'll need your strings to be in a list or something that implements IEnumerable<string>
. The File.ReadLines
suggested by spender will return an IEnumerable<string>
that reads the file line by line.
var stringGroups = File.ReadLines("filename.txt").GroupBy(s => s);
foreach (var stringGroup in stringGroups)
Console.WriteLine("{0} , {1}", stringGroup.Key, stringGroup.Count());
If you want them in order of least to most (as in your example) just add an OrderBy
...
foreach (var stringGroup in stringGroups.OrderBy(g => g.Count()))
...
Другие советы
You can use Linq to implement it
IEnumerable<string> stringSource = File.ReadLines("C:\\file.txt");
var result = stringSource
.GroupBy(str => str)
.Select(group => new {Value = group.Key, Count = group.Count()})
.OrderBy(item => item.Count)
.ToList();
foreach(var item in result)
{
// item.Value - string value
// item.Count - count
}
you can try this :
var groupedLines = System.IO.File.ReadAllLines(@"C:\temp\samplelines.txt").GroupBy(x=>x);
groupedLines.ToList().ForEach(y => Console.WriteLine("Content: {0} - Occurences: {1}", y.Key, y.Count()));
Another, "oldschool" approach is iterating all lines and add them to a Dictioary(if not already present). The key is the line and the value is the count.
var d = new Dictionary<string, Int32>();
foreach (var line in File.ReadAllLines(@"C:\Temp\FileName.txt"))
if (d.ContainsKey(line)) d[line]++; else d.Add(line, 1);
The advantage is, that works also on earlier framework versions.