Question

I have been exploring MMDB systems lately and I haven't been able to find much information with regards to how an in-memory database is supposed to scale. My quite basic assumption is that a main-memory db is constrained by the memory available on the db node, and by the operating system management of this memory. So how can I expand an in-memory system size beyond that of the main memory available? I assume the answer is along the lines of a distributed system but I haven't got it clear in my head how it would work. And of course it's also possible I completely misunderstood the idea of mmdb and i'm missing something obvious.

A bit of background on the question: I am writing a number of cross-platform mobile apps (even though my background is heavily involved with mysql and mongodb), and I don't like native database solutions like sqlite for android and ios. So I thought I'd write my own solution (site and github) in javascript (i'm working on cordova/phonegap). I realised that I could make this a nodejs module and use it as a db for a web app (I'm creating a blog powered by it as an experiment and it's working pretty well), but of course I'm now thinking of making it a separate tier and I started thinking about the obvious limitation of memory size, hence my question.

Was it helpful?

Solution

in-memory databases scale in size the same way as on-disk (aka persistent) databases do: either throw more storage at it (memory, in this case) or distribute it across multiple nodes of a cluster. The latter alternative increases the complexity (both of the DBMS, and your administration of it), relative to an in-memory database on a single system. Consider the difference between vanilla MySQL and MySQL Cluster. And, you'll want to have a really fast network for those times when the DBMS has to perform inter-node operations (e.g. distribute the data or pull data from multiple nodes to satisfy a query).

There's nothing particularly special about in-memory databases in this regard. There are some special optimizations in the database engine when you know storage is memory. But it doesn't change the fundamental principles of database systems.

What you don't want to do is create an in-memory database larger than physical memory. You'll force the OS to swap in-memory database pages in/out of swap space, and the performance will suck. You're better off, in that case, using a conventional DBMS and giving it as much cache as you have memory available for. The DBMS will use the cache more intelligently than the OS' will the swap space.

OTHER TIPS

Current production-ready in-memory databases have mainly focused on scale-up as opposed to scale-out. So-far, they have either managed to integrate main memory tier into their core, existing architecture (IBM via Blu acceleration) or have re-built the database from almost-scratch to leverage the main-memory as primary storage layer (SAP HANA), and in both cases their claim to fame is the obvious speedup that DRAM offers in comparision to the disk.

However very few databases, presently, have a complete offering which scales-out in-memory performance benefit accross multiple nodes. Most of the in-memory databases require the applications to manage the distribution of data/objects across nodes (Ex: SAP HANA).

Oracle's DBIM and MemSQL are a few scalable and distributed options, at this time, that implement distributed in-memory database/tier by collective utilization of memory resources across the cluster (RAC in case of Oracle). MemSQL can be deployed on a cluster of commodity compute nodes and it claims to scale by utilizing aggregate resources, including memory. Oracle RAC is a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide highly scalable and available database solutions, including in-memory benefits.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top