I'm currently trying to use a filter in an existing ElasticSearch instance via the library elasticutils. I'm getting nowhere, unfortunately. I'm not sure if the problem is because I did something basic wrong or if there's a problem in the library (could well be, AFAICT).

I've got an index with a specific mapping, containing a field (say "A") of type string (no explicit analyzer given). That field always contains a list of strings.

I'd like to filter my documents by containing a given string in that field A, so I tried:

import elasticutils as eu
es = eu.S().es(urls=[ URL ]).indexes(INDEX).doctypes(DOCTYPE)
f = eu.F(A="text")
result = es.filter(f)

But that returns an empty result set. I also tried it using f = eu.F(A__in="text") but that resulted in a large error message, the most intriguing part of it being [terms] filter does not support [A].

I'm wondering if I have to configure my index differently, maybe I have to create a facet to be able to use filter? But I didn't find any hint on this in the documentation I read.

My reason for wanting to use filter is that they can be combined freely using and, or, and not. I also found some specs describing that query also can be boolean, but they typically refer to must, should, and must_not which aren't flexible enough for me I think. But I also found some specs which mentioned an operator flag for querys which can be set to and or or. Any info on that is welcome.

So, my questions now are:

  • Is it a configuration problem? Do facets have something to do with this?
  • I'd like to test whether this is a library bug by skipping the lib, so how can I perform this filtering action using just, say, curl? Or any other library (maybe pyes)?
  • Is a flexible combining (using and, or, not, and groupings of them) of several queries possible (i. e. without using filters at all)? How would I do that? (Preferably in elasticutils but other library syntaxes, e. g. pyes, or simple CURLs are welcome as well).
有帮助吗?

解决方案

airza hit the nail on the head with his answer in terms of the filter you're looking for, in CURL format. I suspect the issues you're seeing are largely due to using an abstraction module like elasticutils - it would be good to get familiar with the underlying ES querying protocol first. It will make understanding elasticutils easier. As in my comment above, I recommend installing 'Sense', a plugin for Google Chrome that let's you easily query your ES cluster: https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo?hl=en.

Elasticsearch query filters are extremely flexible - and 'nestable'. You can quite easily nest an or filter inside of a bool must filter. Example:

{
    "query": {
        "filtered": {
           "query": {
               "match_all": {}
           },
           "filter": {
               "bool": {
                   "must": [
                       {
                           "or": [
                                 {"exists": {"field": "sessions"}},
                                 {"range": {"id": {"gte": 56000}}}
                           ]
                       },
                       {
                           "term": {"age_min": "13"}
                       }
                   ],
                   "should": [
                      {
                          "term": {"area": "1"}
                      }
                   ]
               }
           }
        }
    }
}

In this example, results must match one of the two must or filters and the age_min term filter, and items matching the area term filter in the should clause will rank higher than non-matching items.

其他提示

The CURL request to solve this problem is pretty straightforward:

curl -XPOST URL/INDEX/_search? -d '{
  "filter": {
    "term": {
      "A": "val"
    }
  }
}'

There's no particular relationship here to facets (which are a type of search query used to get the size of various subsets of another query) but if the field A is not indexed you won't be able to search for it and find anything. HOWEVER, if this is the case, your ES query should just return any records (since when you query a non-indexed field you are essentially giving ES no particular filter instructions)

The query spit out by my attempt to perform an equivalent ES search using this library was this:

{'filter': {'term': {'language': 'EN'}

Which you can see is the same as the one you ran. What happened when you called result.all() ?

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top