Question

We are evaluating some Java based In Memory databases like Hazelcast and VoltDB. In case we replicate the data across multiple servers, how possible is that GC for both nodes will hit the servers at same time?

For example we have two nodes with 500 GBs of memory and we know that GC will affect our performance drastically once its kicks in. So what is the probabability that GCs in both nodes will hit together?

To put this another way - is it possible to prevent GCs hitting the two nodes simultaneously by some configurations? We are expecting a throughput of around 15k requests per second so with distribution across 4 or more nodes we can stand hit for one node at a time for 25% performance hit and size accordingly.

Was it helpful?

Solution

As Ben points out VoltDB stores all data off heap. The heap is only used for scratch space during transaction routing and stored procedure execution so data for each transaction only lives for a few milliseconds and most never ends up being promoted or live during a GC. Actual SQL execution takes place off heap as well so temp tables don't generate garbage.

GCs in VoltDB should represent < 1% of execution time. You can choose the percentage by sizing the young generation appropriately. Real world deployments at that throughput do a young gen GC every handful of seconds and the GCs should only block for single digit milliseconds. Old gen GCs should be infrequent, on the order of days, and should only block for 10s of milliseconds. You can invoke them manually if you want to make sure they happen during off-peak times.

I don't see why concurrent GCs across nodes would matter. The worst case would be if every node that is a dependency for a transaction does a GC back to back so that latency is the sum of the number of involved nodes. I suggest you measure and see if it actually impacts throughput for a period of time that matters to you.

We put a lot of effort into latency in the most recent release and I can share one of the KPIs.

This is a 3 node benchmark of 50/50 read/write of 32 byte keys and 1024 byte values. There is a single client with 50 threads. There is a node failure during the benchmark and the benchmark runs for 30 minutes. This is not a throughput benchmark so there is only one client instance with a smallish number of threads.

Average throughput:               94,114 txns/sec
Average latency:                    0.46 ms
10th percentile latency:            0.26 ms
25th percentile latency:            0.32 ms
50th percentile latency:            0.45 ms
75th percentile latency:            0.54 ms
90th percentile latency:            0.61 ms
95th percentile latency:            0.67 ms
99th percentile latency:            0.83 ms
99.5th percentile latency:          1.44 ms
99.9th percentile latency:          3.65 ms
99.999th percentile latency:       16.00 ms

If you analyze the numbers further and correlate with other events and metrics you find that GC is not a factor even at high percentiles. Hotspot's ParNew collector is very good if you can keep your working set small and avoid promotion, and even when it's bad in terms of latency it's good in terms of throughput.

Databases that store data on heap do have to be more concerned about GC pauses. At VoltDB we are only concerned about them because we are frequently evaluated by maximum pause time, not average pause time or pause time at some percentile.

OTHER TIPS

If you really want to prevent GC issues, don't use the heap. That is why we are adding a offheap commercial offering for Hazelcast.

On a general level: you get GC issues if you retain objects too long or create objects with a too high frequency that you they are copied to tenure space. So a lot of high speed applications try to prevent creation object litter in the first place.

I'm currently working on a POC implementation of Hazelcast where object creation is completely removed.

There is no way that you can prevent GC kicking-in in different JVMs simultaneously by any configuration. Having said that, you should look at your application and could fine-tune the GC.

Assuming you're running Hazelcast/VoltDB on big(ger) servers with plenty of memory and cores, the Garbage First (G1) garbage collector in new versions of Java could largely ameliorate your concern.

http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html

VoltDB stores table data off the heap. The memory is allocated by the SQL Execution Engine processes which are written in C++.

The java heap in VoltDB is used for relatively static deployment and schema-related data, and for short-term data as it handles the requests and responses. Even much of that is kept off-heap using direct byte buffers and other structures (read more about that here).

For an in-memory DB that maintains consistency like Geode does (i.e. makes synchronous replication to other nodes before releasing the client thread), your network is going to be a bigger concern than will the hotspot compiler. Still, here are two points of input to get you to the point where language is irrelevant:

1) If you are doing lots of creates/ updates over reads: Use off-heap memory on the server. This minimizes GC's.

2) Use Geode's serialization mapping between C/C++ and Java objects to avoid JNI. Specifically, use the DataSerializer http://gemfire.docs.pivotal.io/geode/developing/data_serialization/gemfire_data_serialization.html If you plan to use queries extensively rather than gets/ puts, use the PDXSerializer: http://gemfire.docs.pivotal.io/geode/developing/data_serialization/use_pdx_serializer.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top