Question

My basic requirements from a GraphDB:

  • Mature (production-ready)
  • Native .NET or C++ language binding
  • Horizontal scalability: both
    • Automated data redundancy and sharding
    • Distributed graph algorithms / query execution

Currently I disqualified the following:

  • InfiniteGraph: no C++ / .NET language binding
  • HyperGraphDB: no C++ / .NET language binding
  • Microsoft Trinity: Not mature
  • Neo4j: not distributed

I'm not sure about the scalability of the following:

  • Sparsity DEX
  • Franz Inc. AllegroGraph
  • Sones GraphDB

I found the available information about horizontal scalability capabilities quite general. I guess there are good reasons for this.

Any information would be appreciated.

Was it helpful?

Solution

Unfortunately your basic requirements already extend todays general understanding of graphs - even in the academia. No listed pure graph database will be able to satisfy all your needs. Distributed graph algorithms which are aware of large distributed but interconnected graphs are still a big research issue. So for your application it might be best to find a well matching graph database, graph processing stack or RDF-Store and implement the missing parts on your own. When your application is mostly Online Transactional Graph Processing (OLTP) (read/write heavy) with a focus on the vertices and you can resign on the distributed algorithms for a moment then use one of these:

  • Neo4j
  • OrientDB
  • DEX
  • HyperGraphDB
  • InfiniteGraph
  • InfoGrid
  • Microsoft Horton

When it is more Online Analytical Processing (OLAP) (mostly read) still with a focus on the vertices and distribution really matters then :

  • Apache Hama (early stage project)
  • Microsoft Trinity (research project)
  • Golden Orb (good, but Java only)
  • Signal/Collect (http://www.ifi.uzh.ch/ddis/research/sc , but a research project)

Or is its focus more on the edges, logical reasoning/pattern matching and you need or better can live with a distribution on an edge level like in the Semantic Web then use one of these RDF-/Triple-/Quadstores:

  • AllegroGraph (okay, they are a graphdb/rdf store hybrid ;)
  • Jena
  • Sesame
  • Stardog
  • Virtuoso
  • ...and many more RDF stores

Good starting points might be DEX or Neo4j: If you're looking for a good and really fast graphdb kernel for C++ DEX might be best, but you would have to implement a lot of networking and distribution stuff on your own. Neo4j has a lot of distribution and fault tolerance, but at the moment more on a vertex sharding level and it's kernel is Java. For ideas and inspiration on implementing distributed graph algorithms perhaps take a look at Golden Orb and Signal/Collect. An alternative approach might be starting with AllegroGraph or Stardog. Especially AllegroGraph might be a bit tricky in the beginning until you get adopted to their way of thinking. Stardog is still young and Java, but fast and already quite mature.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top