Question

This is my code for calculate word frequency

  word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay",  "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]

arr_stop_kwd=["a","and"] 

 frequencies = Hash.new(0)
   word_arr.each { |word|
      if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
        frequencies["#{word.downcase}"] += 1
      end
   }

when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way

Thx in advance

Was it helpful?

Solution

Take a look at Facets gem

You can do something like this using the frequency method

require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency

Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top