Matching a complete complex nested collection item instead of separate members with Elastic Search

StackOverflow https://stackoverflow.com/questions/20839428

  •  22-09-2022
  •  | 
  •  

Question

I have an index with a nested collection of items I want to hit. A collection item contains multiple properties that all have to match a certain query, not just any of them. Here's the model:

public class IndexEntry1
{
    public IEnumerable<NestedType1> NestedProperty1 { get; set; }
}

public class NestedType1
{
    public string Member1 { get; set; }
    public string Member2 { get; set; }
}

So I want to hit only documents that have a specific combination of Member1 and Member2 values in the IndexEntry1.NestedProperty1 collection.

I figured that I had to map the collection as nested.

Here's the mapping:

index: {
    properties: {
        nestedProperty1: {
            type: "nested",
            properties: {
                member1: {
                    type: "string",
                    index_analyzer: "my_index_analyzer_1",
                    search_analyzer: "my_search_analyzer_1"
                },
                member2: {
                    type: "string",
                    analyzer: "keyword"
                }
            }
        }
    },
    analysis: {
        tokenizer: {
            my_ngram: {
                type: "nGram",
                min_gram: "1",
                max_gram: "15"
            }
        },
        analyzer: {
            my_index_analyzer_1: {
                type: "custom",
                tokenizer: "my_ngram",
                filters: ["lowercase"]
            },
            my_search_analyzer_1: {
                type: "custom",
                tokenizer: "whitespace",
                filters: ["lowercase"]
            }
        }
    }
}

and use a query like this:

client.Search<IndexEntry1>(d => d
    .Query(query => query
        .Nested(n => n
            .Path(p => p.NestedProperty1)
                .Query(q => q
                    .Bool(b => b
                        .Must(
                            m => m.Term("member1", "value1"),
                            m => m.QueryString(s => s.OnField("member2")
                                .Query("value2"))))))));

However, I'm still getting hits on any document that has either the value1 value or the value2 value while I expect only hits on documents with both values on the same NestedProperty1 collection item.

Was it helpful?

Solution

You are correct that it should only find documents where the nested document has value1 AND value2.

You can observe this is how elasticsearch behaves by running the following in sense (chrome plugin):

PUT http://localhost:9200/nested_example
{
    "mappings": {
        "indexentry" : {
            "properties": {
                "nestedCollection": {
                    "type": "nested",
                    "properties": {
                        "prop1" : { "type": "string", "index": "not_analyzed" },
                        "prop2" : { "type": "string", "index": "not_analyzed" }
                    }
                }
            } 
        }
    }
}
POST http://localhost:9200/nested_example/indexentry/1
{
    "nestedCollection": [
        { "prop1" : "A", "prop2" : "A" },
        { "prop1" : "B", "prop2" : "B" }
    ]
}
POST http://localhost:9200/nested_example/indexentry/2
{
    "nestedCollection": [
        { "prop1" : "C", "prop2" : "C" },
        { "prop1" : "D", "prop2" : "D" }
    ]
}

POST http://localhost:9200/nested_example/indexentry/_search
{
    "query": {
        "nested": {
           "path": "nestedCollection",
           "query": {
                "bool": {
                    "must": [
                       {
                           "term": {
                              "nestedCollection.prop1": {
                                 "value": "A"
                              }
                           }
                       },
                       {
                           "term": {
                              "nestedCollection.prop2": {
                                 "value": "A"
                              }
                           }
                       }

                    ]
                }
           }
        }
    }
}

The previous query will only find document 1 but as soon as you change the term query for nestedColleciton.prop2 to find B instead of A you will no longer get any response as expected.

If I update the example to be more true to your mappings and queries I cannot reproduce the behaviour your witnessing:

PUT http://localhost:9200/nested_example
{
   "settings": {
      "analysis": {
         "tokenizer": {
            "my_ngram": {
               "type": "nGram",
               "min_gram": "1",
               "max_gram": "15"
            }
         },
         "analyzer": {
            "my_index_analyzer_1": {
               "type": "custom",
               "tokenizer": "my_ngram",
               "filters": [
                  "lowercase"
               ]
            },
            "my_search_analyzer_1": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filters": [
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "indexentry": {
         "properties": {
            "nestedCollection": {
               "type": "nested",
               "properties": {
                  "prop1": {
                     "type": "string",
                     "index_analyzer": "my_index_analyzer_1",
                     "search_analyzer": "my_search_analyzer_1"
                  },
                  "prop2": {
                     "type": "string",
                     "analyzer": "keyword"
                  }
               }
            }
         }
      }
   }
}
POST http://localhost:9200/nested_example/indexentry/1
{
    "nestedCollection": [
        { "prop1" : "value1", "prop2" : "value1" },
        { "prop1" : "value2", "prop2" : "value2" }
    ]
}
POST http://localhost:9200/nested_example/indexentry/2
{
    "nestedCollection": [
        { "prop1" : "value3", "prop2" : "value3" },
        { "prop1" : "value4", "prop2" : "value4" }
    ]
}

POST http://localhost:9200/nested_example/indexentry/_search
{
    "query": {
        "nested": {
           "path": "nestedCollection",
           "query": {
                "bool": {
                    "must": [
                       {
                           "term": {
                              "prop1": {
                                 "value": "value1"
                              }
                           }
                       },
                       {
                           "query_string": {
                               "fields": [
                                  "prop2"
                               ],
                               "query": "value1"

                           }
                       }

                    ]
                }
           }
        }
    }
}

Can you update the previous example to better reproduce your situation?

Final note in NEST you can rewrite the query as:

client.Search<IndexEntry1>(d => d
    .Query(query => query
        .Nested(n => n
            .Path(p => p.NestedProperty1)
            .Query(q => 
                q.Term(p => p.NestedProperty1.First().Member1, "value1")
                && q.QueryString(s => s
                    .OnField(p => p.NestedPropery1.First().Member2)
                    .Query("value2")
                )
            )
        )
    );

Strongly typed and with less nesting going on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top