Question

Let's imagine an hypothetical HFT system in Java, requiring (very) low-latency, with lots of short-lived small objects somewhat due to immutability (Scala?), thousands of connections per second, and an obscene number of messages passing around in an event-driven architecture (akka and amqp?).

For the experts out there, what would (hypothetically) be the best tuning for JVM 7? What type of code would make it happy? Would Scala and Akka be ready for this kind of systems?

Note: There has been some similar questions, like this one, but I've yet to find one covering Scala (which has its own idiosyncratic footprint in the JVM).

Was it helpful?

Solution

On my laptop the average latency of ping messages between Akka 2.3.7 actors is ~300ns and it is much less than the latency expected due to GC pauses on JVMs.

Code (incl. JVM options) & test results for Akka and other actors on Intel Core i7-2640M here.

P.S. You can find lots of principles and tips for low-latency computing on Dmitry Vyukov's site and in Martin Thompson's blog.

OTHER TIPS

It is possible to achieve very good performance in Java. However the question needs to be more specific to provide a credible answer. Your main sources of latency will come from follow non-exhaustive list:

  1. How much garbage you create and the work of the GC to collect and promote it. Immutable designs in my experience do not fit well with low-latency. GC tuning needs to be a big focus.

  2. Warm up the JVM so that classes are loaded and the JIT has had time to do its work.

  3. Design your algorithms to be O(1) or at least O(log2 n), and have performance tests that assert this.

  4. Your design needs to be lock-free and follow the "Single Writer Principle".

  5. A significant effort needs to be put into understanding the whole stack and showing mechanical sympathy in its use.

  6. Design your algorithms and data structures to be cache friendly. Cache misses these days are the biggest cost. This is closely related to process affinity which if not set up correctly can result and significant cache pollution. This will involve sympathy for the OS and even some JNI code in some cases.

  7. Ensure you have sufficient cores so that any thread that needs to run has a core available without having to wait.

I recently blogged about a case study of such an exercise.

You may find that use of a ring buffer for message passing will surpass what can be done with Akka. The main ring buffer implementation that people use on the JVM for financial applications is one called Disruptor which is carefully tuned for efficiency (power of two size), for the JVM (no GC, no locks) and for modern CPUs (no false sharing of cache lines).

Here is an intro presentation from a Scala point of view http://scala-phase.org/talks/jamie-allen-sdisruptor/index.html#1 and there are links on the last slide to the original LMAX stuff.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top