Question

I've been struggling with this problem for quite a bit and now I decided to post a question on the topic to seek for some advice.

I'm testing the environment for indexing a big number of data over time. Basically everyday I'll index logs and associated documents from various websites.

I want to have one index per website to have a better logical division and to be able to filter the queries and have faster responses.

I foresee a traffic of ~1Gb of data per day per index.

I'm testing the environment on an AWS instance with 2x80Gb SSDs, CPU vendor: Intel, CPU model: Xeon (2500 MHz), CPU total logical cores: 8

I have 1 node, x indexes and 1 shard per index (for the test environment) and pushing documents taken from wikipedia articles, randomly distributing them on 200 indexes or more.

I'm indexing documents form a Queue, with a single client at the rate of about 200/s. This is just for testing. When I'll be able to fix the OOM issue, I'll increase the number of clients and the number of nodes.

I have I tried with 500k docs and 200 indexes and it is fine, if I try with 500k and 300 indexes, it throws the hated OOM error exception. Also 1m documents and 100 indexes throws back an OOM error. I'm running now tests with 1m documents and 200 indexes.

I tried changing the shard number, and I also tried to play with these parameters index.merge.policy.max_merged_segment: 2g index.merge.policy.segments_per_tier: [ I tried 5 (and reduced the max number too) - 25] index.merge.policy.max_merge_at_once: 8

My settings are (retrieved with the /_nodes/settings,os,process,jvm?pretty command)

{ 
  "cluster_name" : "elasticsearch", 
  "nodes" : { 
    "QBk8YzISQPu-VVMnKvhEmQ" : { 
      "name" : "", 
      "transport_address" : "", 
      "host" : "", 
      "ip" : "", 
      "version" : "1.1.0", 
      "build" : "2181e11", 
      "http_address" : "", 
      "settings" : { 
        "path" : { 
          "data" : "/mnt/db/se_data/elasticsearch/", 
          "logs" : "/mnt/db/searchengines/elasticsearch-1.1.0/logs", 
          "home" : "/mnt/db/searchengines/elasticsearch-1.1.0" 
        }, 
        "cluster" : { 
          "name" : "elasticsearch" 
        }, 
        "index" : { 
          "number_of_shards" : "1" 
        }, 
        "foreground" : "yes", 
        "name" : "", 
        "max-open-files" : "true" 
      }, 
      "os" : { 
        "refresh_interval" : 1000, 
        "available_processors" : 8, 
        "cpu" : { 
          "vendor" : "Intel", 
          "model" : "Xeon", 
          "mhz" : 2500, 
          "total_cores" : 8, 
          "total_sockets" : 8, 
          "cores_per_socket" : 32, 
          "cache_size_in_bytes" : 25600 
        }, 
        "mem" : { 
          "total_in_bytes" : 31502180352 
        }, 
        "swap" : { 
          "total_in_bytes" : 3071995904 
        } 
      }, 
      "process" : { 
        "refresh_interval" : 1000, 
        "id" : 20155, 
        "max_file_descriptors" : 64000, 
        "mlockall" : false 
      }, 
      "jvm" : { 
        "pid" : 20155, 
        "version" : "1.7.0_51", 
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM", 
        "vm_version" : "24.51-b03", 
        "vm_vendor" : "Oracle Corporation", 
        "start_time" : 1398962098305, 
        "mem" : { 
          "heap_init_in_bytes" : 268435456, 
          "heap_max_in_bytes" : 10667687936, 
          "non_heap_init_in_bytes" : 24313856, 
          "non_heap_max_in_bytes" : 136314880, 
          "direct_max_in_bytes" : 10667687936 
        }, 
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ], 
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ] 
      } 
    } 
  } 
} 

Here are the JVM settings: -Xms256m -Xmx10g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Delasticsearch -Des.foreground=yes

max numbers of file open is set to 65k (this was the previous issue i had.. the number of allowed open files was only 4k)

Checking the memory consumption with bigdesk, I see the Heap Memory committed/used grows constantly while reaching the crash-point. I think that is causing the OOM error even though it crashes when the consumption reaches ~4-6Gb towards a max allocated java heap of 10Gb.

Could it be a GC problem? Should I follow the settings from this article to tune it?

If I insert a flush() command every, say, 10k inserted documents on any index, could it help reducing the memory usage?

Is the number of indexes too high? Should I change the approach? Put more logs coming from different websites in a single index seem to be the most plausible solution, is it?

PS: I'm using and a python script to test the environment, with the elasticsearch for python module (should I use pyelasticsearch instead? It should be just a feature-richer module, right?).

Let me know if you need more info. Thank you for your time!

EDIT: The stacktrace from the ES log file is the following - trying to load 500k documents on 300 index.

[2014-05-01 00:16:49,526][INFO ][node                     ] [Captain Wings] stopping ...
[2014-05-01 00:16:49,767][WARN ][index.shard.service      ] [Captain Wings] [index_220][0] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [index_220][0] Refresh failed
    at org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:725)
    at org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:469)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:920)
    [...]
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.close(CompressingStoredFieldsWriter.java:138)
    [...]
    ... 5 more
    Suppressed: java.io.FileNotFoundException: _75.fdx
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexWriter.close(CompressingStoredFieldsIndexWriter.java:205)
        ... 24 more
[2014-05-01 00:16:49,804][WARN ][index.merge.scheduler    ] [Captain Wings] [index_65][0] failed to merge
java.io.FileNotFoundException: _7n_es090_0.tim
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
    at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
    at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
    at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:81)
    at org.apache.lucene.codecs.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1140)
    at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsConsumer.close(BloomFilterPostingsFormat.java:371)
    at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat$1.close(Elasticsearch090PostingsFormat.java:61)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsConsumerAndSuffix.close(PerFieldPostingsFormat.java:86)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:163)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.close(PerFieldPostingsFormat.java:154)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:389)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
    at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
    at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
    at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.tip
        ... 26 more
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.doc
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.close(Lucene41PostingsWriter.java:587)
        ... 23 more
        Suppressed: java.io.FileNotFoundException: _7n_es090_0.pos
            ... 28 more
[2014-05-01 00:16:49,807][WARN ][index.engine.internal    ] [Captain Wings] [index_65][0] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: _7n_es090_0.tim
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:92)
    at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
    at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.FileNotFoundException: _7n_es090_0.tim
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
    at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
    at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
    at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:81)
    at org.apache.lucene.codecs.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1140)
    at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsConsumer.close(BloomFilterPostingsFormat.java:371)
    at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat$1.close(Elasticsearch090PostingsFormat.java:61)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsConsumerAndSuffix.close(PerFieldPostingsFormat.java:86)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:163)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.close(PerFieldPostingsFormat.java:154)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:389)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
    at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
    ... 9 more
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.tip
        ... 26 more
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.doc
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.close(Lucene41PostingsWriter.java:587)
        ... 23 more
        Suppressed: java.io.FileNotFoundException: _7n_es090_0.pos
            ... 28 more
[2014-05-01 00:16:49,807][WARN ][index.shard.service      ] [Captain Wings] [index_65][0] Failed to perform scheduled engine optimize/merge
org.elasticsearch.index.engine.OptimizeFailedEngineException: [index_65][0] Optimize failed
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:936)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: _7n_es090_0.tim
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:93)
    at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
    at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
    ... 4 more
Caused by: java.io.FileNotFoundException: _7n_es090_0.tim
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
    at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
    at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
    at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:81)
    at org.apache.lucene.codecs.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1140)
    at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsConsumer.close(BloomFilterPostingsFormat.java:371)
    at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat$1.close(Elasticsearch090PostingsFormat.java:61)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsConsumerAndSuffix.close(PerFieldPostingsFormat.java:86)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:163)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.close(PerFieldPostingsFormat.java:154)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:389)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
    at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
    ... 9 more
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.tip
        ... 26 more
    Suppressed: java.io.FileNotFoundException: _7n_es090_0.doc
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.close(Lucene41PostingsWriter.java:587)
        ... 23 more
        Suppressed: java.io.FileNotFoundException: _7n_es090_0.pos
            ... 28 more
[2014-05-01 00:16:49,913][WARN ][cluster.action.shard     ] [Captain Wings] [index_65][0] sending failed shard for [index_65][0], node[Fj243PXdSNGtSm_jLNl9hQ], [P], s[STARTED], indexUUID [v6ZQJKpTRbiZZ5e6Q6ENyw], reason [engine failure, message [MergeException[java.io.FileNotFoundException: _7n_es090_0.tim]; nested: FileNotFoundException[_7n_es090_0.tim]; ]]
[2014-05-01 00:16:49,913][WARN ][cluster.action.shard     ] [Captain Wings] [index_65][0] received shard failed for [index_65][0], node[Fj243PXdSNGtSm_jLNl9hQ], [P], s[STARTED], indexUUID [v6ZQJKpTRbiZZ5e6Q6ENyw], reason [engine failure, message [MergeException[java.io.FileNotFoundException: _7n_es090_0.tim]; nested: FileNotFoundException[_7n_es090_0.tim]; ]]
[2014-05-01 00:17:03,453][WARN ][index.merge.scheduler    ] [Captain Wings] [index_221][0] failed to merge
java.io.FileNotFoundException: _7e_es090_0.tim
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
    at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
    at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
    at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:81)
    at org.apache.lucene.codecs.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1140)
    at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsConsumer.close(BloomFilterPostingsFormat.java:371)
    at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat$1.close(Elasticsearch090PostingsFormat.java:61)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsConsumerAndSuffix.close(PerFieldPostingsFormat.java:86)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:163)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.close(PerFieldPostingsFormat.java:154)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:389)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
    at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
    at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
    at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
    Suppressed: java.io.FileNotFoundException: _7e_es090_0.tip
        ... 26 more
    Suppressed: java.io.FileNotFoundException: _7e_es090_0.doc
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.close(Lucene41PostingsWriter.java:587)
        ... 23 more
        Suppressed: java.io.FileNotFoundException: _7e_es090_0.pos
            ... 28 more
[2014-05-01 00:17:03,454][WARN ][index.engine.internal    ] [Captain Wings] [index_221][0] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: _7e_es090_0.tim
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:92)
    at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
    at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
    at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
    at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
    at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.FileNotFoundException: _7e_es090_0.tim
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
    at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
    at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
    at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:81)
    at org.apache.lucene.codecs.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1140)
    at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsConsumer.close(BloomFilterPostingsFormat.java:371)
    at org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat$1.close(Elasticsearch090PostingsFormat.java:61)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsConsumerAndSuffix.close(PerFieldPostingsFormat.java:86)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:163)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.close(PerFieldPostingsFormat.java:154)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:389)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
    at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
    at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
    ... 9 more
    Suppressed: java.io.FileNotFoundException: _7e_es090_0.tip
        ... 26 more
    Suppressed: java.io.FileNotFoundException: _7e_es090_0.doc
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:63)
        at org.elasticsearch.index.store.Store$StoreIndexOutput.close(Store.java:611)
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:140)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.close(Lucene41PostingsWriter.java:587)
        ... 23 more
        Suppressed: java.io.FileNotFoundException: _7e_es090_0.pos
            ... 28 more
[2014-05-01 00:17:06,961][INFO ][node                     ] [Captain Wings] stopped
[2014-05-01 00:17:06,961][INFO ][node                     ] [Captain Wings] closing ...
[2014-05-01 00:17:06,966][INFO ][node                     ] [Captain Wings] closed

EDIT: another run to store 3m documents over 100 indexes. Here is a screenshot from BigDesk I took when ES crashed: BigDesk Capture

Here is the stacktrace (used an external repo since it was too big to paste) http://m.uploadedit.com/b034/1399053538319.txt

It stopped at ~500k docs, I was checking the files descriptors open for that process and they were < 5000. Heap memory for the JVM -Xmx is 20gb and 5 shards...

Anyway, storing 3m docs in 1 index is simply fine.

EDIT: ulimit -a output:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 240150
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The number of files has been changed inside the /etc/security/limits.conf file, to 64000. It does not display here. I was monitoring the number of open files inside the /proc//fd/ folder, it was beyond 1024 (I think was around 6k for 100 indexes) but well below 64k.

EDIT: Result of http://54.227.158.137:9200/_nodes/stats/indices?pretty before a crash: { "cluster_name" : "elasticsearch", "nodes" : { "t5FjNo1xQbCk97Qqv0w2sQ" : { "timestamp" : 1399395162423, "name" : "Dementia", "transport_address" : "inet[/10.69.21.196:9300]", "host" : "ip-10-69-21-196", "ip" : [ "inet[/10.69.21.196:9300]", "NONE" ], "indices" : { "docs" : { "count" : 433533, "deleted" : 0 }, "store" : { "size_in_bytes" : 10837028821, "throttle_time_in_millis" : 1971123 }, "indexing" : { "index_total" : 435229, "index_time_in_millis" : 1651288, "index_current" : 0, "delete_total" : 0, "delete_time_in_millis" : 0, "delete_current" : 0 }, [...] "merges" : { "current" : 4, "current_docs" : 574, "current_size_in_bytes" : 10968915, "total" : 13935, "total_time_in_millis" : 9056598, "total_docs" : 635889, "total_size_in_bytes" : 15384667490 }, "refresh" : { "total" : 161452, "total_time_in_millis" : 9013814 }, "flush" : { "total" : 1000, "total_time_in_millis" : 730342 }, "warmer" : { "current" : 0, "total" : 176035, "total_time_in_millis" : 18410 }, "filter_cache" : { "memory_size_in_bytes" : 0, "evictions" : 0 }, "id_cache" : { "memory_size_in_bytes" : 0 }, "fielddata" : { "memory_size_in_bytes" : 0, "evictions" : 0 }, "percolate" : { "total" : 0, "time_in_millis" : 0, "current" : 0, "memory_size_in_bytes" : -1, "memory_size" : "-1b", "queries" : 0 }, "completion" : { "size_in_bytes" : 0 }, "segments" : { "count" : 36091, "memory_in_bytes" : 76230702 }, "translog" : { "operations" : 80160, "size_in_bytes" : 1070283 } } } } }

Was it helpful?

Solution

Have you tuned your field data cache? It defaults to unbounded and with this many indexes it is not surprising that you run out of memory. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

Update, this apparently solved the problem:

The OOM is happening with memory mapped files. This type of memory is allocated outside of the regular JVM heap and is also out of scope for garbage collections. Have a look at this thread: groups.google.com/forum/#!topic/elasticsearch/4Nj_HUl78KA for some things that might work. Apparently linux limits memory available for this by default and you should be able to remove this restriction.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top