Question

I'm intrigued in the database service Datomic, but I'm not sure if it fits the needs of the projects I work on. When is Datomic a good choice, and when should it be avoided?

Was it helpful?

Solution

With the proviso that I haven't used Datomic in production, thought I'd give you an answer.

Advantages

  1. Datalog queries are powerful (more so than non-recursive SQL) and very expressive.
  2. Queries can be written with Clojure data structures, and it's NOT a weak DSL like many SQL libraries that allow you to query with data structures.
  3. It's immutable, so you get the advantages that immutability gives you in Clojure/other languages as well a. This also allows you to store, while saving structures, all past facts in your database—this is VERY useful for auditing & more

Disadvantages

  1. It can be slow, as Datalog is just going to be slower than equivalent SQL (assuming an equivalent SQL statement can be written).
  2. If you are writing a LOT, you could maybe need to worry about the single transactor getting overwhelmed. This seems unlikely for most cases, but it's something to think about (you could do a sort of shard, though, and probably save yourself; but this isn't a DB for e.g. storing stock tick data).
  3. It's a bit tricky to get up and running with, and it's expensive, and the licensing and price makes it difficult to use a hosted instance with it: you'll need to be dealing with sysadminning this yourself instead of using something like Postgres on Heroku or Mongo at MongoHQ

I'm sure I'm missing some on each side, and though I have 3 listed under disadvantages, I think that the advantages outweigh them in more circumstances where disadvantages don't preclude its use. Price is probably the one that will prevent its being used in most small projects (that you expect to outlast the 1 year free trial).

Cf. this short post describing Datomic simply for some more information.

Expressivity (c.f. Datalog) and immutability are awesome. It's SO much fun to work with Dataomic in that regard, and you can tell it's powerful just by using it a bit.

OTHER TIPS

One important thing when considering if Datomic is the right fit for your application is to think about shape of the data you are going to store and query - as Datomic facts are actually very similar to RDF triples (+ first class time notion) it lends itself very good to modeling complex relationships (linked graph data) - something which is often cumbersome with traditional SQL databases. I found this aspect to be one of the most appealing and important for me, it worked really well, even if this is of course not something exclusive to Datomic, as there are many other high-quality offerings for graph databases, one must mention Neo4J when we are talking about JVM based solutions.
Regarding Datomic schema, i think it's just the right balance between flexibility and stability.

To complete the above answers, I'd like to emphasize that immutability and the ability to remember the past are not 'wizardry features' suited to a few special case like auditing. It is an approach which has several deep benefits compared to 'mutable cells' databases (which are 99% of databases today). Stuart Halloway demonstrates this nicely in this video: the Impedance Mismatch is our fault.

In my personal opinion, this approach is fundamentally more sane conceptually. Having used it for several months, I don't see Datomic has having crazy magical sophisticated powers, rather a more natural paradigm without some of the big problems the others have.

Here are some features of Datomic I find valuable, most of which are enabled by immutability:

  1. because reading is not remote, you don't have to design your queries like an expedition over the wire. In particular, you can separate concerns into several queries (e.g find the entities which are the input to my query - answer some business question about these entities - fetch associated data for presenting the result)
  2. the schema is very flexible, without sacrificing query power
  3. it's comfortable to have your queries integrated in your application programming language
  4. the Entity API brings you the good parts of ORMs
  5. the query language is programmable and has primitives for abstraction and reuse (rules, predicates, database functions)
  6. performance: writers impede only other writers, and no one impedes readers. Plus, lots of caching.
  7. ... and yes, a few superpowers like travelling to the past, speculative writes or branching reality.

Regarding when not to use Datomic, here are the current constraints and limitations I see:

  1. you have to be on the JVM (there is also a REST API, but you lose most of the benefits IMO)
  2. not suited for write scale, nor huge data volumes
  3. won't be especially integrated into frameworks, e.g you won't currently find a library which generates CRUD REST endpoints from a Datomic schema
  4. it's a commercial database
  5. since reading happens in the application process (the 'Peer'), you have to make sure that the Peer has enough memory to hold all the data it needs to traverse in a query.

So my very vague and informal answer would be that Datomic is a good fit for most non-trivial applications which write load is reasonable and you don't have a problem with the license and being on the JVM.

As an analogy, you can ask yourself the same question for Git as compared to other version control systems which are not based on immutability.

Just to tentatively add over the other answers:

It is probably fair to say datomic presents the better conceptual framework for a queryable data store of all other current options out there, while being partially scalable and not exceptionally performant.

I say only partially scalable, because queries need to fit in the peer RAM or fail. And not exceptionally performant, as top-notch SQL engines can optimize queries to fit in memory through sophisticated execution plans, something I've not yet seen mentioned as a feature in datomic; Datomic's decoupling of transacting and querying might in the overall offset this feature.

Unlike many NoSQL engines though, transactions are a first-class citizen, which puts it at par with RDBMS systems in that key regard.

For applications where data is read more than being written, transactions are needed, queries always fit in memory or memory is very cheap, and the overall size of accumulated data isn't too large, it might be a win where a commercial-only product can be afforded ― for those who are willing to embrace its novel conceptual framework implied in the API.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top