Question

I am looking for a relatively quick way to check whether words are misspelled, either using a gem or API.

I've tried using several gems -- raspell, ffi-aspell, hunspell-ffi, spell_cheker, and spellchecker -- and each has a different error.

I'm pretty new to ruby and hoping for a simple solution (I'm processing a lot of short text files and want to calculate the % of words mis-spelled) that doesn't include building something from scratch.

When trying ffi-aspell, I get the following error:

/Users/ntaylorthompson/.rvm/gems/ruby-1.9.2-p320/gems/ffi-aspell-0.0.3/lib/ffi/aspell/speller.rb:121: [BUG] Segmentation fault
ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-darwin11.4.0]

-- control frame ----------
c:0005 p:---- s:0019 b:0019 l:000018 d:000018 CFUNC  :speller_check
c:0004 p:0113 s:0013 b:0013 l:000012 d:000012 METHOD /Users/ntaylorthompson/.rvm/gems/ruby-1.9.2-p320/gems/ffi-aspell-0.0.3/lib/ffi/aspell/speller.rb:121
c:0003 p:0049 s:0007 b:0007 l:0005a8 d:0005d0 EVAL   ffi-aspell_test.rb:5
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:0005a8 d:0005a8 TOP   
---------------------------
-- Ruby level backtrace information ----------------------------------------
ffi-aspell_test.rb:5:in `<main>'
/Users/ntaylorthompson/.rvm/gems/ruby-1.9.2-p320/gems/ffi-aspell-0.0.3/lib/ffi/aspell/speller.rb:121:in `correct?'
/Users/ntaylorthompson/.rvm/gems/ruby-1.9.2-p320/gems/ffi-aspell-0.0.3/lib/ffi/aspell/speller.rb:121:in `speller_check'

-- C level backtrace information -------------------------------------------

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Abort trap: 6

I'd appreciate either (1) a suggestion of an alternative approach to those above or (2) a recommendation of which to use of the 5 gems above -- so I can at least spend time debugging the best option.

Was it helpful?

Solution

raspell is no longer maintained, so ffi-aspell is a good option if you have the libaspell headers available.

If you can't get the libraries to work, you can just shell out to the aspell binary. The following method will do just that (unit tests included):

# Returns the percentage of incorrect words per document
#
def spellcheck(filename)
  fail "File #{filename} does not exist" unless File.exists?(filename)

  words = Float(`wc -w #{filename}`.split.first)
  wrong = Float(`cat #{filename} | aspell --list | wc -l`.split.first)

  wrong / words
end

if $0 == __FILE__
  require 'minitest/autorun'
  require 'tempfile'

  describe :spellcheck do
    def write(str)
      @file.write str
      @file.read
    end

    before do
      @file = Tempfile.new('document')
    end

    it 'fails when given a bad path' do
      -> { spellcheck('/tmp/does/not/exist') }.must_raise RuntimeError
    end

    it 'returns 0.0 if there are no misspellings' do
      write 'The quick brown fox'
      spellcheck(@file.path).must_equal 0.0
    end

    it 'returns 0.5 if 2/4 words are misspelled' do
      write 'jumped over da lacie'
      spellcheck(@file.path).must_be_close_to 0.5, 1e-8
    end

    it 'returns 1.0 if everything is misspelled' do
      write 'Da quyck bown foxx jmped oer da lassy dogg'
      spellcheck(@file.path).must_equal 1.0, 1e-8
    end

    after do
      @file.close
      @file.unlink
    end
  end
end

spellcheck() assumes you have cat, wc, and aspell on your path, and that the default dictionary is what you want to use. The unit test is for Ruby 1.9 only -- if you're running 1.8, just delete it.

OTHER TIPS

As jmdeldin said raspell is no longer maintained, ffi-aspell is a fork of it.

I played few minutes with it and it's quite easy to use:

  1. Instantiates an FFI::Aspell::Speller object specifying the language
  2. Check if a word is correct using speller.correct?(word)
  3. Get a list of suggestions for a word using speller.suggestions(word)

NOTE: The bigger limitation I've found so far is that the interface of the speller works on words only. If you want to spell check a whole document you'll need to split it in words. This could not be trivial, especially if you have an HTML input...

(It depends on aspell of course so you need to install it using brew install aspell or your preferred package manager)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top