Question

I have read a file and split them into an array of words:

file1 = File.open("spam1.txt","rb")
file1_contents = file1.read
file1 = file1_contents.split(' ')

I can count the frequency of words, using a hash, and sort them according to the frequency of the word:

freqs1 = Hash.new(0)
file1.each { |word| freqs1[word] +=1}
freqs1 = freqs1.sort_by {|x,y| y}
freqs1.reverse!

Can also output the results to the user like this:

freqs.each{|word, freq| puts word + ' ' + freq.to_s}

I want to display a message to the user, if the array file1 or hash freqs1 contains certain words multiple times.

I had a (bad) idea to loop through the freqs1 hash and display the appropriate message to the user:

freqs1.each{|word,freq|
    if ((word == ('business' || 'fund' || 'funds' || 'account' ||'transfer' || 'money')) && freq > 2)  || (word == 'Iraq' && freq > 1 )  then
      puts "File 1 is suspected as spam mail - suspicious word frequency"
    else
      puts "File 1 does not appear to be spam email"
    end
}

However this was silly of me as this acts on each element of the hash.

How can I display a certain message to the user if words like business, fund, funds, account etc appear more than twice?

Thanks for any help.

Était-ce utile?

La solution

If you're just looking to improve that final statement, try this (un-tested, but should go):

bad_words = %w{business fund funds account transfer money}
is_spam = freqs1.any? do |word, freq| 
  (freq > 2 && bad_words.include?(word)) || (word == 'Iraq' && freq > 1)
end

if is_spam
  puts "File 1 is suspected as spam mail - suspicious word frequency"
else
  puts "File 1 does not appear to be spam email"
end

Enumerable#any? will do most of the work for you, also extracting the list of bad words aids readability.

Autres conseils

I would do something like this:

word_filter = [
 {count: 2, words: ['business','fund','funds','account','transfer','money']},
 {count: 1, words: ['iraq']}
]

alert        = "File 1 is suspected as spam mail - suspicious word frequency"
calm_message = "File 1 does not appear to be spam email"

grouped_words = file1.group_by{|x|x}.map{|x,array|[x,array.size]}

appears_to_be_spam = grouped_words.any?{ |word,count|
  word_filter.any? do |filter|
    filter[:words].include?(word.downcase) &&  count > filter[:count]
  end
}

puts appears_to_be_spam ? alert : calm_message
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top