Full Text Search on Heroku, database and or indexer selection?

https://stackoverflow.com/questions/9305516

25-10-2019
|

Question

I am looking to implement (free as in beer) full text searching on a small application on Heroku (minimal number of users, limited dataset). However, I am struggling to find a best pattern for doing so, one option is to use the 10mb limit of xeround, while it lasts (we may exceed this in the near future), the second is to somehow roll my own full text search on MongoDB or CouchDB.

The documents in this application are archived emails that I wish to make searchable from a mailing list, there are approximately 10k such emails, plain text, roughly 700bytes per.

I would prefer fuzzy search capabilities, thus the push for whoosh.

Among my requirements (I should have mentioned earlier, is for it to be free!)

I have not found any patterns for using whoosh with MongoDB in a python, flask application.

Can anyone provide more information on how to handle full text search in a small heroku, python application?

Solution

So I've not tried it, but http://tenderlove.github.com/texticle/ seems to imply that you can use native pgsql fulltext search if you can fit within the space limits. The trouble with whoosh is that you're going to run into issues with disk space and its persistence within heroku rules.

The other thing to do is to work with the add ons as suggested via the dev docs: http://devcenter.heroku.com/articles/full-text-search

As for patterns, you basically have to do the fulltext search and get back data/ids of records and then query your data store (mongo) for the full dataset based on the fulltext results. It's a manual process, but nothing that's too strange. If the search doesn't need full records, you can usually get away with stashing the important data with the fulltext information, but that'll increase the size of your fulltext indexing.

OTHER TIPS

pysolr solves your problem.

Have you considered using Apache SolR? I think it's the best solution for a free-text search engine, it's free and open source.

To use SolR from python I recommend you the MySolr library. Is quite faster and easier to use than pysolr (you can see some stats here)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow