Frage

This is my code for calculate word frequency

  word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay",  "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]

arr_stop_kwd=["a","and"] 

 frequencies = Hash.new(0)
   word_arr.each { |word|
      if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
        frequencies["#{word.downcase}"] += 1
      end
   }

when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way

Thx in advance

War es hilfreich?

Lösung

Take a look at Facets gem

You can do something like this using the frequency method

require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency

Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top