Domanda

I have got over 600k lines of string. I want to group same strings and learn their counts.

So example

i go to school
i like music
i like games
i like music
i like music
i like games
i like music

So result will be

i go to school , 1
i like games  , 2
i like music , 4

How can I do that with the fastest possible way?

È stato utile?

Soluzione

The GroupBy method is what you want. You'll need your strings to be in a list or something that implements IEnumerable<string>. The File.ReadLines suggested by spender will return an IEnumerable<string> that reads the file line by line.

var stringGroups = File.ReadLines("filename.txt").GroupBy(s => s);
foreach (var stringGroup in stringGroups)
    Console.WriteLine("{0} , {1}", stringGroup.Key, stringGroup.Count());

If you want them in order of least to most (as in your example) just add an OrderBy

...
foreach (var stringGroup in stringGroups.OrderBy(g => g.Count()))
    ...

Altri suggerimenti

You can use Linq to implement it

IEnumerable<string> stringSource = File.ReadLines("C:\\file.txt");

var result = stringSource
    .GroupBy(str => str)
    .Select(group => new {Value = group.Key, Count = group.Count()})
    .OrderBy(item => item.Count)
    .ToList();

foreach(var item in result)
{
    // item.Value - string value
    // item.Count - count
}

you can try this :


var groupedLines = System.IO.File.ReadAllLines(@"C:\temp\samplelines.txt").GroupBy(x=>x);
groupedLines.ToList().ForEach(y => Console.WriteLine("Content: {0} - Occurences: {1}", y.Key, y.Count()));

Another, "oldschool" approach is iterating all lines and add them to a Dictioary(if not already present). The key is the line and the value is the count.

var d = new Dictionary<string, Int32>();
foreach (var line in File.ReadAllLines(@"C:\Temp\FileName.txt"))
     if (d.ContainsKey(line)) d[line]++; else d.Add(line, 1);

The advantage is, that works also on earlier framework versions.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top