Sorry for cross posting.The following question is also posted on Elastic Search's google group.

In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.

Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.

Here is how I am checking the performance.

Using 0.19.1 version of pyes (tried both stable and dev version from github) Using 0.13.8 version of requests

conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
    if x % 1000 == 0 and x > 0:
        loop_check_point = time.clock()
        print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)

    results = conn.search(query=q1)
    if results:
        for r in results:
            pass
#            print len(results)
    else:
        pass

Appreciate any help that you can give to help me scaleup the searches.

Thanks!

有帮助吗?

解决方案

Isn't it just a matter of concurrency?

You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.

Try to run a few instances of your script in parallel and see what kind of performance you got.

其他提示

There are severeal ways to improve that with using pyes.

  • First of all try to get rid of the DottedDict class/object which is used to generat from every json/dict to an object for every result you get.
  • Second switch the json encoder to ujson.

These two things brought up a lot of performance. This has the disadvantage that you have to use the ways to access dicts instead of the dotted version ("result.facets.attribute.term" instead you have to use something like "result.facets['attribute']['term']" or "result.facets.get('attribute', {}).get('term', None)" )

I did this through extending the ES class and replace the "_send_request" function.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top