문제

I am using ElasticSearch to index some data. But I found that the performance is not that efficiency.

There are only 3000 entries data and each data has 6 columns. It costs 5 mins to index these 3000 entries.

Because I am new with ElasticSearch, my code and program flow are basic as following:

  1. search and check is there any same data with it.
  2. if there is same data, then update.
  3. If not, then add.

The code is following:

conn = pyes.ES('server:9200')

Search:

searchResult = conn.search(searchDict, indexName, TypeName)

Index

conn.index(storeDict, indexName, TypeName, id)

Update the Count in the index data.

 conn.partial_update(indexName, TypeName, id, "ctx._source.Count += counter", params={"counter" : 1})

Is there any method that can improve the performance of my code ?

Thank you for your help.

도움이 되었습니까?

해결책

You don't need to search before updating. Read the es docs on updating and scroll down to the upsert section. upsert is a parameter which holds a document to use if the document does not exist on the server, otherwise the upsert is ignored and it works like a normal update request (as you are doing now).

Good luck!

다른 팁

  • You can use versioning feature of elasticsearch. If you are deciding your documents id's its pretty easy. It simply re-index the data.

  • You should use BULK API for indexing.(1000-5000 is good)

  • Another reason of bad performance is about configuration settings on config/elasticsearch.yml, you can use this hints to increase indexing performance.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top