Frage

I'm experiencing a problem with a producer-consumer setup for a local bot competition (think Scalatron, but with more languages allowed, and using pipes to connect with stdin and stdout). The items are produced fine, and handled correctly by the consumer, however, the consumer's task in this setting is to call other pieces of software that might take up too much memory, hence the out of memory error.

I've got a Python script (i.e. the consumer) continuously calling other pieces of code using subprocess.call. These are all submitted by other people for evaluation, however, sometimes one of these submitted pieces use so much memory, the engine produces an OutOfMemoryError, which causes the entire script to halt.

There are three layers in the used setup:

  • Consumer (Python)
  • Game engine (Java)
  • Players' bots (languages differ)

The consumer calls the game engine using two bots as arguments:
subprocess.call(['setsid', 'sudo', '-nu', 'botrunner', '/opt/bots/sh/run_bots.sh', bot1, bot2]).

Inside the game engine a loop runs pitting the bots against each other, and afterwards all data is saved in a database so players can review their bots. The idea is, should a bot cause an error, to log the error and hand victory to the opponent.

What is the correct place to catch this, though? Should this be done on the "highest" (i.e. consumer) level, or in the game engine itself?

War es hilfreich?

Lösung

The correct place to catch any Exception or Error in Java is the place where you have a mechanism to handle them and perform some recovery steps. In the case of OutOfMemoryError, you should catch the error ONLY when you are able to to close it down gracefully, cleanly releasing resources and logging the reason for the failure, if possible.

OutOfMemoryError occurs due to a block memory allocation that cannot be satisfied with the remaining resources of the heap. Whenever OutOfMemoryError is thrown, the heap contains the exact same number of allocated objects before the unsuccessful attempt of allocation. This should be the actual time when you should catch the OutOfMemoryError and attempt to drop references to run-time objects to free even more memory that may be required for cleanup.

If the JVM is in reparable state, which you can never determine it through the program, it is even possible to recover & continue from the error. But this is generally considered as a not good design as I said you can never determine it through the program.

If you see the documentation of java.lang.Error, it says

An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch.

If you are catching any error on purpose, please remember NOT to blanket catch(Throwable t) {...} everywhere in your code.

More details here.

Andere Tipps

You can catch and attempt to recover from OutOfMemoryError (OOM) exceptions, BUT IT IS PROBABLY A BAD IDEA ... especially if your aim is for the application to "keep going".

There are a number of reasons for this:

As pointed out, there are better ways to manage memory resources than explicitly freeing things; i.e. using SoftReference and WeakReference for objects that could be freed if memory is short.

If you wait until you actually run out of memory before freeing things, your application is likely to spend more time running the garbage collector. Depending on your JVM version and on your GC tuning parameters, the JVM can end up running the GC more and more frequently as it approaches the point at which will throw an OOM. The slowdown (in terms of the application doing useful work) can be significant. You probably want to avoid this.

If the root cause of your problem is a memory leak, then the chances are that catching and recovering from the OOM will not reclaim the leaked memory. You application will keep going for a bit then OOM again, and again, and again at ever reducing intervals.

So my advice is NOT attempt to keep going from an OOM ... unless you know:

where and why the OOM happened,
that there won't have been any "collateral damage", and
that your recovery will release enough memory to continue.

There is probably at least one good time to catch an OutOfMemoryError, when you are specifically allocating something that might be way too big:

public static int[] decode(InputStream in, int len) throws IOException {
  int result[];
  try {
    result = new int[len];
  } catch (OutOfMemoryError e) {
    throw new IOException("Result too long to read into memory: " + len);
  } catch (NegativeArraySizeException e) {
    throw new IOException("Cannot read negative length: " + len);
  }

}
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top