Is a JVM based dbms like Neo4J implemented ideally?

https://softwareengineering.stackexchange.com/questions/409004

10-03-2021
|

문제

Since Neo4J is implemented in Java and therefore uses the JVM, wouldn't an equivalent graph database that is written in C++ / Rust or GoLang be more performant? Why would one decide to build a DBMS in a medium "high level" language like Java?

해결책

I can think of a few good reasons:

Java is ubiquitous. You can find a Java programmer on almost every street corner.
Java has many characteristics that make it a good (but not great) general-purpose language, including readability.
Java is the same language that's commonly used to develop large enterprise programs.
Most importantly, Java is clearly not a bottleneck; otherwise, they wouldn't have chosen it.

Performance is often dictated by the "slowest component in the system." More often that not, the slowest component in a system with a database is "the wire." That's the network the database is operating on, and the medium by which it must transmit the information it retrieves to the client. This transmission time is almost always going to be the bulk of the time a query takes.

This "slowest component in the system" effect is also evident in the code we write. We never optimize every part of our program, because the bulk of the execution time is always spent in 10 percent of the code or less. So we fire up a profiler, let it tell us where that 10 percent is, and optimize only that code.

Performance isn't the most important metric in a database anyway; scalability is. Raw performance has more to do with how you craft your queries (returning larger chunks of data instead of taking query hits on several smaller chunks), and it's more important for a database to be able to keep up with a large number of users than it is to get the fastest possible speed on a single query.

Finally, the way a database system is written has far more impact on its speed than the programming language it is written in. You're going to get a much faster result on a database search of a billion records in a btree index which is O(n log n) (which only requires looking at 30 nodes in the tree) than you are in a linear search, which may require looking at all 1 billion records to find the one you want.

So as you can see, there are many factors that go into the performance and scalability of a database. The programming language you use is just one factor, and it's arguably the least important.

다른 팁

For Rust and Go, the reason why Neo4J isn't implemented in them, is rather simple: they didn't exist. Neo4J was first published in 2007, release 1.0 was in 2010. Design sketches for Rust started in 2006, but there was no stable release until 2015, 8 years after the first public code of Neo4J. Design of Go started in 2007, but there was no stable release until 2012, 5 years after the first public code of Neo4J.

But also, your basic assumption is wrong: there is no reason for applications written in Java to be slower than applications written in other languages.

In fact, depending on the benchmark, Java can be faster than C++. People underestimate just how good Oracle's C2 compiler, Azul's compiler, IBM's compiler, etc. really are, they underestimate how hard it is to get adequate information for optimizations at compile time whereas a JIT compiler can measure, profile, benchmark as much as it wants, and how much the low-levelness of C and C++ (which made it possible to get performance via hand-tuning code 50 years ago) actually hurts a modern aggressively optimizing compiler because the compiler constantly has to assume that the programmer tries to go around its back.

Another aspect that Robert alludes to in his answer is that databases are often I/O bound, not compute-bound, so the performance actually doesn't matter.

Somewhat related to that, is that many applications are written in Java, or otherwise hosted on the JVM, and crossing the boundary between the JVM and native code is really expensive. So, even if a Neo4J clone written in C++ were theoretically faster (which I doubt), systems built with "Neo4C++" may still be slower because every access to the database needs to cross the JVM / native boundary.

This used to be much worse when Neo4J was written: JNI was even more painfully slow than it is today, and JNA only appeared in 2007 with limited features and limited performance.

So, in summary:

It is in no way given that C++ would be more performant.
Even if C++ were more performant, databases are I/O-bound, not CPU-bound, so it wouldn't actually make the database more performant.
Even if it made the database more performant, that performance would probably be more than negated by being forced to cross the JVM / native barrier.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange