Question

I am looking at integrating Neo4j into a Clojure system I am building. The first question I was asked was why I didn't use Datomic. Does anyone have a good answer for this? I have heard of and seen videos on Datomic, but I don't know enough about Graph Databases to know the difference between Neo4j and Datomic, and what difference it would make to me?

Was it helpful?

Solution

There are a few fundamental difference between them:

Data Model

Both Neo4j and Datomic can model arbitrary relationships. They both use, effectively, an EAV (entity-attribute-value) schema so they both can model many of the same problem domains except Datomic's EAV schema also embeds a time dimension (i.e. EAVT) which makes it very powerful if you want to perform efficient queries against your database at arbitrary points in time. This is something that non-immutable data stores (Neo4j included) could simply not do.

Data Access

Both Neo4j and Datomic provide traversal APIs and query languages:

Queries

Both Neo4j and Datomic provide declarative query languages (Cypher and Datalog, respectively) that support recursive queries except Datomic's Datalog provides far superior querying capabilities by allowing custom filtering and aggregate functions to be implemented as arbitrary JVM code. In practice, this means Cypher's built-in functions can effectively be superseded by Clojure's sequence library. This is possible because your application, not the database, is the one running queries.

Traversal

Traversal APIs are always driven by application code, which means both Neo4j and Datomic are able to walk a graph using arbitrary traversal, filtering and data transformation code except Neo4j requires a running transaction which in practice means it's time-bounded.

Data Consistency

Another fundamental difference is that Datomic queries don't require database coordination (i.e. no read transactions) and they always work with a consistent data snapshot which means you could perform multiple queries and data transformations over an arbitrary period of time and guarantee your results will always be consistent and that no transaction will timeout (because there's none). Again, this is impossible to do in non-immutable data stores like the vast majority of existing databases (Neo4j included). This also applies to their traversal APIs.

Both Neo4j and Datomic are transactional (ACID) systems, but because Neo4j uses traditional interactive transactions -using optimistic concurrency controls-, queries need to happen inside transactions (need to be coordinated) which imposes timeout constraints to your queries. In practice, this means that for very complex, long-running queries, you'll end-up splitting your queries, so they finish within certain time limits, giving up data consistency.

Working Set

If for some reason your queries needed to involve a huge amount of data (more than it would normally fit in memory) and you couldn't stream the results (since Datomic provides streaming APIs), Datomic would probably not be a good fit since you wouldn't be taking advantage of Datomic's architecture, forcing peers to constantly evict their working memory, performing additional network calls and decompressing data segments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top