How to make word freq counter more efficient?
-
09-12-2019 - |
Question
I have written this F# code to count word frequencies in a list and return a tuple to C#. Could you tell me how can I make the code more efficient or shorter?
let rec internal countword2 (tail : string list) wrd ((last : string list), count) =
match tail with
| [] -> last, wrd, count
| h::t -> countword2 t wrd (if h = wrd then last, count+1 else last @ [h], count)
let internal countword1 (str : string list) wrd =
let temp, wrd, count = countword2 str wrd ([], 0) in
temp, wrd, count
let rec public countword (str : string list) =
match str with
| [] -> []
| h::_ ->
let temp, wrd, count = countword1 str h in
[(wrd, count)] @ countword temp
Solution
If you want to count word frequencies in a string list, your approach seems to be overkill. Seq.groupBy
is well-fitted for this purpose:
let public countWords (words: string list) =
words |> Seq.groupBy id
|> Seq.map (fun (word, sq) -> word, Seq.length sq)
|> Seq.toList
OTHER TIPS
Even pad's version can be made more efficient and concise:
let countWords = Seq.countBy id
Example:
countWords ["a"; "a"; "b"; "c"] //returns: seq [("a", 2); ("b", 1); ("c", 1)]
Your solution iterates over the input list several times, for every new word that it founds. Instead of doing that, you could iterate over the list just once and build a dictionary that holds the number of all occurrences for every word.
To do this in a functional style, you can use F# Map
, which is an immutable dictionary:
let countWords words =
// Increment the number of occurrences of 'word' in the map 'counts'
// If it isn't already in the dictionary, add it with count 1
let increment counts word =
match Map.tryFind word counts with
| Some count -> Map.add word (count + 1) counts
| _ -> Map.add word 1 counts
// Start with an empty map and call 'increment'
// to add all words to the dictionary
words |> List.fold increment Map.empty
You can also implement the same thing in an imperative style, which is going to be more efficient, but less elegant (and you don't get all benefits of functional style). However, standard mutable Dictionary
can be used nicely from F# too (this is going to be similar to C# version, so I won't write it here).
Finally, if you want a simple solution using just standard F# functions, you can use Seq.groupBy
as suggested by pad. This would be probably almost as efficient as the Dictionary
based version. But then, if you're just learning F# then writing a few recursive functions like countWords
yourself is a great way to learn!
To give you some comments about your code - the complexity of your approach is slightly higher, but that should probably be fine. There are however some common isses:
In your
countword2
function, you haveif h = wrd then ... else last @ [h], count
. The calllast @ [h]
is inefficient, because it needs to clone the entire listlast
. Instead of this, you could just writeh::last
to add the word to the beginning, because the order does not matter.On the last line, you're using
@
again in[(wrd, count)] @ countword temp
. This is not necessary. If you're adding single element to the beginning of list, you should use:(wrd,count)::(countword temp)
.