سؤال

What is the best way to search a phrase that have words that dont all matches, for example:

description = "a cell phone that have an external memory"

and i want to search:

search = "a good phone"

is there any tips using mongodb, or do i use Knuth-Morris-Pratt string matching from python (which will kill the server)?

هل كانت مفيدة؟

المحلول

For a simple regular expression search of a mongo db field you can use find with the "$regex" query expression.

In pymongo that would be db.your_collection.find({"description": {"$regex": "<insert regex here>"}}).

This will get you started. As others have stated, MongoDB doesn't necessarily appreciate you beating it up like this. You may need to consider a more robust solution for big time searching.

Please consider the performance implications of doing a regular expression search in your DB.

Read the MongoDB reference here http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions.

نصائح أخرى

MongoDB isn't really geared for such shannenigans. I would recommend using an external service like SphinxSearch or Solr for your searching needs.

You could use MapReduce to build a search index and then search in the resulting collection.

Your map function would first split the description into individual words. Very common words like "a" or "the" should be discarded. Then it would do an emit per word. Key is the word and value is the _id of the currently processed document.

Your reduce function would then be used to collect all documents which contain each word. It would return the key with all arrays merged into one and duplicates removed.

The resulting collection of this MapReduce job would then contain one document for each individual word which appears in the descriptions. These documents would contain the word and an array with the _id's of the documents where it appears. When you add an index you can search it very quickly.

This MapReduce job needs to be performed once to build the search index. This will take a while when you already have a lot of data in the database. Whenever a document is added or removed or when a description of a document was changed, you have to perform an incremental MapReduce to update the search index. This incremental MapReduce will be much faster than the initial one, so it should be feasible to do that automatically.

No one here has actually referenced the doc page on searching: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

A good way of avoiding methods which will not scale such as Map Reduce and Regex is to actually store an array of keywords within your doc.

You would decide how you wish to infix etc words and what stop words you would want to remove and once done you would just shove that into a big array in the doc itself.

Map reduce is sometimes considered a bad way of doing this because it just won't for performance and threading reasons and regex because it has very bad use of index in 90% of the cases unless prefixed. I have seen a simple regex kill a lot of mongodb servers so I know just how bad it can be untamed.

I do agree with everyone else though that you should really look into a external FTS tech. I personnaly adore Sphinx: http://sphinxsearch.com/ for its speed, scalability and flexibility. However I have used other search techs like Solr and they are all pretty darn good.

Just want to add a plug for Elastic Search. They have a ton of client libraries, including several for python.

Both Solr and ElasticSearch are built on Apache Lucene, but ElasticSearch has some advantages over Solr, IMO, starting with the fact that it speaks JSON instead of XML.

mongo 3.0+: just use text index on the field with the phrase. https://docs.mongodb.org/v3.0/core/index-text/

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top