Why not start from a simple scan, where you use a simple regular expression to get all words from the text? http://ruby-doc.org/core-2.1.0/String.html#method-i-scan
For English, the regular expression should be simply \w, and some special characters like the ' you indicate.