Is it more performant to do an OR on multiple fields, or combine those field values into a single fieldname

StackOverflow https://stackoverflow.com/questions/23497676

Question

I have a datastore model that looks like:

class Project(ndb.Model):
   name = ndb.StringProperty()
   statement = ndb.StringProperty()
   description = ndb.StringProperty()

We are building out a Search API implementation from our model data... so I will be building Search API Documents and mapping our datastore models to them.

The end result is that I want to search all three of these fields from a single query... so for example, user types "city" and the system should find all Projects that have the word "city" in either name, statement, or description

I could define the Document with three fields that generally map 1-to-1 with the model such as:

fields = [
   TextField(name="name" value=proj.value),
   TextField(name="statement" value=proj.statement)
   HtmlField(name="description" value=proj.description)
]

and then query with

"name:city OR statement:city OR description:city"

of course Search API documentation says:

The "OR" disjunction is an expensive operation in both billable operations and computation time

So my other option could be to combine these into a single searchable fieldname like:

fields = [
   TextField(name="search" value=proj.value),
   TextField(name="search" value=proj.statement)
   HtmlField(name="search" value=proj.description)
]

and query with:

"search:city"

Should I assume the latter would perform better? But that approach would lose the distinction in field names, and possibly lose future benefits of "custom scoring" that the GAE team may add (see this question/answer):

Google App Engine Search API

Am I just trying to over-optimize too early and overthink everything? What say ye?

Was it helpful?

Solution

You can continue indexing as you are doing now:

fields = [
   TextField(name="name", value=proj.value),
   TextField(name="statement", value=proj.statement)
   HtmlField(name="description", value=proj.description)
]

NOTE: You may want to strip the html before indexing it, unless it really adds value to have it as such

And then use the value as query. If you don't specify a field, you'll get results for documents with matches in any field.

your_index.search(query)  # This will match the value of city in any field

If you have more fields that the ones you are saying here, and don't want those included in your search, then using a field to specify the combined values of those 3 would be the way to go:

fields = [
   TextField(name="name", value=proj.value),
   TextField(name="statement", value=proj.statement)
   HtmlField(name="description", value=proj.description)
   TextField(name="foo", value=proj.foo)
   TextField(name="bar", value=proj.bar)
   TextField(name="composed_field", value=' '.join((
       proj.value,
       proj.statement,
       proj.description
   ))  # or something like this
]

And then:

your_index.search('composed_field:"%s"' % query)  # Look ma', no OR
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top