Domanda

I have implemented a healthcheck servlet on our jetty 8 server using Metrics. Besides being polled from our Load Balancer the Metrics are periodically (5min) written to log files. Additionally email-notifications are sent, when some of the metrics such as CPU Load or Memory Consumption reach critical limits.

This works perfect for CPU Load an System Memory Consumption. However, one of the metrics defined to measure the JVM Memory Consumption regularly exceeds the defined threshold of 95%, although the server is running stable. So we might have to re-think our decision about this particular metric. Is it a good metric to use in a healthcheck? Is it an indication of a memory leak that our web application regularly hits threshold until the Garbage Collector is run or is that normal behavior that is to be expected for every long running web application?

Thank you for your input.

Here is our Code, that impelments the JVM Memory Healthcheck.

Java Runtime Memory

    private final Runtime runtime = Runtime.getRuntime();

    Result check() throws Exception {

        final long freeMem = this.runtime.freeMemory();
        // maxMemory() is the value set by the JVM -Xmx (Max HeapSize) parameter
        final long maxMem = this.runtime.maxMemory();
        final long usedMem = maxMem - freeMem;          

        final double value = RatioGauge.Ratio.of(usedMem, maxMem).getValue();
        final double threshold = 0.95;

        if (value < threshold) {
            // Everything OK: Memory usage usage is below the threshold.
        } else {
            // NOT OK: Memory usage is above the threshold.
        }
    }
È stato utile?

Soluzione

You need to establish baseline. As long as your app works fine under desired load (number of request/sec, num of online users, etc), any cpu/memory consumption will do, more or less. Once you have baseline, you can add features to your code and then check if features make consumption decrease or increase and act accordingly. Then, if needed, you will optimize places in code which worsened after features were added (or you discover that no local code change saves you, and you need more hw to support new features under given load, or you need to re-design some components of your app, but it's another story).

So, it's not an absolute value that matter most (although having JVM heap consumption constantly at 95% is a bit worrying), but a change between code changes.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top