How would I search for blank facets in a multi valued facet field and at the same time in Solr?

StackOverflow https://stackoverflow.com/questions/2250751

  •  20-09-2019
  •  | 
  •  

Question

I have an application where users can pick car parts. They pick their vehicle and then pick vehicle attributes as facets. After they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. The problem was, not all documents have an engine size (it's an empty value in Solr), as it doesn't matter for all parts. For example, an engine size rarely matters for an air filter. So even if a user picked 3.5L for their engine size, I still wanted to show the air filters on the screen as a possible part the user could pick. I did some searching and the following facet query works perfectly:

enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *]) 

This query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn't matter, and it fit the car). Perfect...

THE PROBLEM: I recently made the vehicle attribute fields multivalued fields, so I could store attributes for each part as a list. I then applied faceting to it, and it worked fine. However, the problem came up when I applied the query previously mentioned above. While selecting the enginesize facet narrowed down the number of documents displayed to only documents that have that engine size, records (I also use the word record to mean document) that had empty values (i.e. "") for enginesize were not appearing. The same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.

Example:

 <doc> 
  <str name="part">engine mount</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
  </arr>
 <doc>

<doc> 
  <str name="part">engine bolt</str>
  <arr name="enginesize">
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
  </arr>
 <doc>

 <doc> 
  <str name="part">air filter</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
  </arr>
 <doc>

What I am looking for is a query that will pull back documents 1 and 3 above when I do a facet search for the engine size for 3.5. The first document (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that I am looking for (contains 3.5 in one of the fields). However, the third document for the air filter doesn't get returned because of the empty <str> values. I do not want to return the second document at all because it doesn't match the facet value

I basically want a query that will match empty string values for a given facet and also match the actual value, so I get both documents returned.

Does someone have a query that would return document 1 and document 3 (the engine bracket and the air filter), but not the engine bolt document?

I tried the following without success (including the one at the very top of this question):

// returns everything
enginesize:"3.5"    OR  (enginesize:[* TO *] )
// only returns document 1
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// only returns document 1
enginesize:"3.5" OR (enginesize:"")

I imported the data above using a CSV file, I set the field keepEmpty=true. I tried instead manually inserting a space into the field when I generated the CSV file (which would give you <str> </str>, instead of the previous , and then retried the queries. Doing that, I got the following results:

// returns document 1
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
// returns all documents
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// returns all documents
enginesize:"3.5" OR (enginesize:"")

Does anyone have a query that would work for either situation, whether I have a space as the blank value or simply no value at all?

Was it helpful?

Solution

How about changing how you index, instead of how you query?

Instead of trying to index "engine size doesn't matter" as an empty record, index it as "ANY".

Then your query simply becomes enginesize:"3.5" OR (enginesize:ANY)

OTHER TIPS

i've just been playing with this and found a hint that seems to do the trick for me. translated to your query it should be:

enginesize:"3.5" OR (-enginesize:["" TO *])

hth,

andi


update: after some more testing i don't think this works reliably — for some indexes it had to be the other way round and without the minus sign, i.e. enginesize:[* TO ""]. this might depend on the index type, if it's multi-valued or even on the actual values.

in any case it seems too much of a hack. i'll probably resolve to substituting the empty value with a special marker...

I had the same problem, but solved it in https://stackoverflow.com/a/35633038/13365:

enginesize:"3.5" OR (*:* NOT enginesize:["" TO *])

The -enginesize solution didn't work for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top