Elasticsearch clients for python, no solution

https://stackoverflow.com/questions/11797035

24-06-2021
|

문제

I am having a very bad week having chosen elasticsearch with graylog2. I am trying to run queries against the data in ES using Python.

I have tried following clients.

ESClient - Very weird results, I think its not maintained, query_body has no effect it returns all the results.
Pyes - Unreadable, undocumented. I have browsed sources and cant figure out how to run a simple query, maybe i am just not that smart. I can even run base queries in json format and then simply use the Python object/iterators to do my analysis on the results. But Pyes does not make it easy.
Elasticutils - Another documented, but without a complete sample. I get the following error with code attached. I don't even know how it uses this S() to connect to the right host?

es = get_es(hosts=HOST, default_indexes=[INDEX])

basic_s = S().indexes(INDEX).doctypes(DOCTYPE).values_dict()

results:

 print basic_s.query(message__text="login/delete")
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 223, in __repr__
    data = list(self)[:REPR_OUTPUT_SIZE + 1]
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 623, in __iter__
    return iter(self._do_search())
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 573, in _do_search
    hits = self.raw()
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 615, in raw
    hits = es.search(qs, self.get_indexes(), self.get_doctypes())
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 841, in search
    return self._query_call("_search", body, indexes, doc_types, **query_params)
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 251, in _query_call
    response = self._send_request('GET', path, body, querystring_args)
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 208, in _send_request
    response = self.connection.execute(request)
  File "/usr/lib/python2.7/site-packages/pyes/connection_http.py", line 167, in _client_call
    return getattr(conn.client, attr)(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/pyes/connection_http.py", line 59, in execute
    response = self.client.urlopen(Method._VALUES_TO_NAMES[request.method], uri, body=request.body, headers=request.headers)
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 255, in urlopen
    raise MaxRetryError("Max retries exceeded for url: %s" % url)
pyes.urllib3.connectionpool.MaxRetryError: Max retries exceeded for url: /graylog2/message/_search

I wish the devs of this good projects would provide some complete examples. Even looking at sources I am t a complete loss.

Is there any solution, help out there for me with elasticsearch and python or should I just drop all of this and pay for a nice splunk account and end this misery.

I am proceeding with using curl, download the entire json result and json load it. Hope that works, though curl downloading 1 million messages from elasticsearch may not just happen.

해결책

Honestly, I've had the most luck with just CURLing everything. ES has so many different methods, filters, and queries that various "wrappers" have a hard time recreating all the functionality. In my view, it is similar to using an ORM for databases...what you gain in ease of use you lose in flexibility/raw power.

Except most of the wrappers for ES aren't really that easy to use.

I'd give CURL a try for a while and see how that treats you. You can use external JSON formatters to check your JSON, the mailing list to look for examples and the docs are ok if you use JSON.

다른 팁

I have found rawes to be quite usable: https://github.com/humangeo/rawes

It's a rather low-level interface but I have found it to be much less awkward to work with than the high-level ones. It also supports the Thrift RPC if you're into that.

Explicitly setting the host resolved that error for me:

basic_s = S().es(hosts=HOST, default_indexes=[INDEX])

FWIW, PYES docs are here: http://packages.python.org/pyes/index.html

Usage: http://packages.python.org/pyes/manual/usage.html

ElasticSearch recently (Sept 2013) released an official Python client elasticsearch-py (elasticsearch on PyPI, also on github), which is supposed to be a fairly direct mapping to the official ElasticSearch API. I haven't used it yet, but it looks promising, and at least it will match the official docs!

Edit: We started using it, and I'm very happy with it. ElasticSearch's API is pretty clean, and elasticsearch-py maintains that. Easier to work with and debug in general, plus decent logging.

ElasticUtils has sample code: http://elasticutils.readthedocs.org/en/latest/sampleprogram1.html

If there are other things you need in the docs, just ask.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow