Question

My website stores several million entities. Visitors search for entities by typing words contained only in the titles. The titles are at most 100 characters long.

This is not a case of classic document search, where users search inside large blobs. The fields are very short. Also, the main issue here is performance (and not relevance) seeing as entities are provided "as you type" (auto-suggested).

What would be the smarter route?

  • Create a MySql table [word, entity_id], have 'word' indexed, and then query using
    select entity_id from search_index where word like '[query_word]%
    This obviously requires me to break down each title to its words and add a row for each word.
  • Use Solr or some similar search engine, which from my reading are more oriented towards full text search.

Also, how will this affect me if I'd like to introduce spelling suggestions in the future.

Thank you!

Was it helpful?

Solution

Pro's of a Database Only Solution:

  • Less set up and maintenance (you already have a database)
  • If you want to JOIN your search results with other data or otherwise manipulate them you will be able to do so natively in the database
  • There will be no time lag (if you periodically sync Solr with your database) or maintenance procedure (if you opt to add/update entries in Solr in real time everywhere you insert them into the database)

Pro's of a Solr Solution:

  • Performance: Solr handles caching and is fast out of the box
  • Spell check - If you are planning on doing spell check type stuff Solr handles this natively
  • Set up and tuning of Solr isn't very painful, although it helps if you are familiar with Java application servers
  • Although you seem to have simple requirements, I think you are getting at having some kind of logic around search for words; Solr does this very well

You may also want to consider future requirements (what if your documents end up having more than just a title field and you want to assign some kind of relevancy? What if you decide to allow people to search the body text of these entities and/or you want to index other document types like MS Word? What if you want to facet search results? Solr is good at all of these).

I am not sure if you would need to create an entry for every word in your database, vs. just '%[query_word]%' search if you are going to create records with each word anyway. It may be simpler to just go with a database for starters, since the requirements seem pretty simple. It should be fairly easy to scale the database performance.

I can tell you we use Solr on site and we love the performance and we use it for even very simple lookups. However, one thing we are missing is a way to combine Solr data with database data. And there is extra maintenance. At the end of the day there is not an easy answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top