Question

I am evaluating different data from a textfile in a rather large algorithm.

If the text file contains more than datapoints (the minimum I need is sth. like 1.3 million datapoints) it gives the following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.util.regex.Matcher.<init>(Unknown Source)
   at java.util.regex.Pattern.matcher(Unknown Source)
   at java.lang.String.replaceAll(Unknown Source)
   at java.util.Scanner.processFloatToken(Unknown Source)
   at java.util.Scanner.nextDouble(Unknown Source)

When I'm running it in Eclipse with the following settings for the installed jre6 (standard VM):

-Xms20m -Xmx1024m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:NewSize=10m 
-XX:MaxNewSize=10m -XX:SurvivorRatio=6 -XX:TargetSurvivorRatio=80 
-XX:+CMSClassUnloadingEnabled

Note that it works fine if I only run through part of the textfile.

Now I've read a lot about this subject and it seems that somewhere I must have either a data leak or I'm storing too much data in arrays (which I think I do).

Now my problem is: how can I work around this? Is it possible to change my settings such that I can still perform the computation or do I really need more computational power?

Was it helpful?

Solution

The really critical vm arg is -Xmx1024m, which tells the VM to use up to 1024 megabytes of memory. The simplest solution is to use a bigger number there. You can try -Xmx2048m or -Xmx4096m, or any number, assuming you have enough RAM in your machine to handle it.

I'm not sure you're getting much benefit out of any of the other VM args. For the most part, if you tell Java how much space to use, it will be smart with the rest of the params. I'd suggest removing everything except the -Xmx param and seeing how that performs.

A better solution is to try to improve your algorithm, but I haven't yet read through it in enough detail to offer any suggestions.

OTHER TIPS

As you are saying that the data size is really very large, if it does not fit in one computers memory even after using -Xmx jvm argument, then you may want to move to cluster computing, using many computers working on your problem. For this you will have to use Message Passing Interface (MPI).

MPJ Express is a very good implementation of MPI for Java, or in languages like C/C++ there are some good implementations for MPI existing like Open MPI and mpich2. I am not sure whether it will help you in this situation, but certainly will help you in future projects.

I suggest you

  • use a profiler to minimize your memory usage. I suspect you can reduce it by a factor of 10x or more by using primitives, binary data, and more compact collections.
  • increase your memory in your machine. The last time I did back testing of hundreds of signals I had 256 GB of main memory and this was barely enough at times. The more memory you can get the better.
  • use memory mapped files to increase memory efficiency.
  • Reduce the size of your data set to sometime you machine and program can support.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top