Question

I have read a file and split them into an array of words:

file1 = File.open("spam1.txt","rb")
file1_contents = file1.read
file1 = file1_contents.split(' ')

I can count the frequency of words, using a hash, and sort them according to the frequency of the word:

freqs1 = Hash.new(0)
file1.each { |word| freqs1[word] +=1}
freqs1 = freqs1.sort_by {|x,y| y}
freqs1.reverse!

Can also output the results to the user like this:

freqs.each{|word, freq| puts word + ' ' + freq.to_s}

I want to display a message to the user, if the array file1 or hash freqs1 contains certain words multiple times.

I had a (bad) idea to loop through the freqs1 hash and display the appropriate message to the user:

freqs1.each{|word,freq|
    if ((word == ('business' || 'fund' || 'funds' || 'account' ||'transfer' || 'money')) && freq > 2)  || (word == 'Iraq' && freq > 1 )  then
      puts "File 1 is suspected as spam mail - suspicious word frequency"
    else
      puts "File 1 does not appear to be spam email"
    end
}

However this was silly of me as this acts on each element of the hash.

How can I display a certain message to the user if words like business, fund, funds, account etc appear more than twice?

Thanks for any help.

Was it helpful?

Solution

If you're just looking to improve that final statement, try this (un-tested, but should go):

bad_words = %w{business fund funds account transfer money}
is_spam = freqs1.any? do |word, freq| 
  (freq > 2 && bad_words.include?(word)) || (word == 'Iraq' && freq > 1)
end

if is_spam
  puts "File 1 is suspected as spam mail - suspicious word frequency"
else
  puts "File 1 does not appear to be spam email"
end

Enumerable#any? will do most of the work for you, also extracting the list of bad words aids readability.

OTHER TIPS

I would do something like this:

word_filter = [
 {count: 2, words: ['business','fund','funds','account','transfer','money']},
 {count: 1, words: ['iraq']}
]

alert        = "File 1 is suspected as spam mail - suspicious word frequency"
calm_message = "File 1 does not appear to be spam email"

grouped_words = file1.group_by{|x|x}.map{|x,array|[x,array.size]}

appears_to_be_spam = grouped_words.any?{ |word,count|
  word_filter.any? do |filter|
    filter[:words].include?(word.downcase) &&  count > filter[:count]
  end
}

puts appears_to_be_spam ? alert : calm_message
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top