When should I consider using a in memory database and what are the issue to look out for?

https://stackoverflow.com/questions/1593692

22-09-2019
|

Question

I was just think that now it is common to have enough RAM on your database server to cache your complete database why are the specialist in memory database (e.g TimesTen, see also Wikipedia page) that were all the rage a few years ago not being used more?

It seems to be that as time go on, none disk based databases are being used less, e.g most applications are now built on conventional rational databases. I would have expected the opposite as RAM is getting close to being free for a lot of servers.

I am asking this, as I just read up on the stack-overflow-architecture and the page says

This is significant because Stack Overflow's database is almost completely in RAM and the joins still exact too high a cost.

But I don’t think this would be a problem if “pointers” and “collections” were used instead of the normal btree. Btree are a very clever to get round limits on disk access speed, e.g they trade CPU useage to reduce disk usage. However we now have so match ram.

But we still need database, as doing your own

Locking
Deadlock detection
Transaction logging
Recovering
Etc

Is very hard.

@S.Lott, Given we all spend so long choosing indexes, avoiding joins and investigating database performance problems. There must be a better way. A few years ago we were told the “in memory databases” was the better way. So before I jump into using one etc, I wish to know why other people are not using them more.

(I am unlikely to use TimesTen myself, as it is high priced ($41,500.00 / Processor) and I don’t like talking to Oracle sales people - I rather spend my time writing code.)

See also:

Update:

I asked this question a LONG time ago, these days Microsoft SQL Server have "In-Memory OLTP" that is a memory-optimized database engine integrated into the SQL Server engine. It is not cheap, but seems to be very fast for some workloads.

Solution

Most probably there are just no mature products of memory databases which could be used as a full replacement for a classic database.

Relational database are a very old concept. Although there were many approaches to move forward and develop new technologies, eg. object oriented databases, the relational databases didn't really change their concepts. Don't expect things to change too fast, since databases didn't change much in the last ten or fifteen years or even longer.

I think, development of technologies is not as fast as one might believe. It takes decades for new concepts to be matured and established. First of all in database technologies, where maturity is much more important then anything else.

In ten or twenty years, databases are probably not the same anymore as they are today. If in-memory databases are the future - nobody can tell this today - they just need some more time to develop.

OTHER TIPS

Nobody really answered the question "When should I consider using a in memory database and what are the issue to look out for?" so I'll give it a go.

You should consider an in-memory database if: 1. The target system has data to manage, but no persistent media 2. The performance requirement simply cannot be met with a persistent database

For #1, think of the TV Guide in your set-top box (STB). Low-end STB (i.e. those with no DVR capability) have no persistent storage, and need no persistent storage. But the database for a 400-channel, 14-day TV Guide is non-trivial. There's a performance requirement here, too, because data arrives from the transponder carousel at a high speed and it's a case of 'capture it or wait until the carousel comes around again'. But there's no need for persistence. We've all seen this; when you lose power at your home, when it comes back on the TV Guide says "will be available shortly" because it's provisioning itself from the transponder or cable head-end. Network routers share the same characteristics: no persistent storage, need to be fast, and the database can be provisioned from an external source (peer routers on the network, in this case, to repopulate the routing table).

There are endless examples of #2: Real-time targetting in military systems, high-frequency trading systems, and more.

Regarding the second part of the question, "issue to watch out for": There are many.

Make sure you're evaluating a true in-memory database if you need the performance that only an in-memory database can deliver. Caching a persistent database is not the same. Throwing a persistent database in a RAM-drive is not the same. Using an in-memory database that inherently does transaction logging (like TimesTen) is not the same (even if you log to /dev/null).

Make sure you're evaluating a database system, and not merely a cache (e.g. memcache). A database system will have support for transactions with the ACID properties, multiple indexing options, support concurrent access, and more.

About ACID: in-memory database systems do not lack the 'D' (durability). It simply has to be taken in context. Transactions in a persistent database are durable only so long as the media it's stored on is durable. The same thing is true for in-memory databases. In either case, if you care about durability, you better have a backup.

The trend seems to be to cache aggressively and use the database to populate the cache. Regardless of where the database lives, joins are still expensive so the preference seems to be to do the join once and cache the result in something like Memcached or Velocity.

There are still in-memory databases around and they are used, but it depends upon the context you want to use them. SQLite for example is often used as an in-memory database when testing data layers.

The most important reason is cargo culture, and the very low knowledge level in IT. Most applications work sufficiently well whatever the persistence solution used, and as computers are still getting faster each year, not enough people feel the pain and are capable of pinpointing the problem.

Microsoft and Oracle make too much money with their database products to make it (politically) possible for them to come up with better approaches.

The development costs of using a relational database are not made transparent so management has no idea that there is a problem, let alone a solution.

Well, in-memory databases generally lack the D (durability) in ACID (atomicity, consistency, isolation, durability) by their very nature. This can be overcome to an extent with "hybrid" approaches, however, at some point something (either the data itself, or a transaction log) has to be persisted somewhere to provide the durability aspect. This can generally slow down performance or introduce other non-desirable properties to an in-memory database solution

In contrast, most of todays RDBMS's have the full complement of ACID, as well as having many decades of development behind them. This has resulted in disk-based database systems that are very performant, especially with the many years of improvements and optimisations that modern day RDBMS system have seen (your BTree example being just one of many).

Another factor is our ability as application developers to reduce the load on the database by such mechanisms as caching, thereby squeezing much more perceived performance from the data layer of an application. Indeed, caching itself has seen extensive developments in recent years with distributed caching being common nowadays (just look at the number of users of memcached, for example).

Ironically, the modern day caching systems are, in many ways, slowly transmogrifying into something akin to a true in-memory database system. In-memory databases, like object-oriented databases, are very much the "new kids on the block", so it will be interesting to see where all of this goes in time. Oracle has now acquired TimesTen, and, according to this wikipedia article, Microsoft are looking at getting into the in-memory database market quite soon. That's two modern day "big players" in the traditional RDBMS field that are taking in-memory database systems seriously.

This is also an option: http://www.memsql.com/

I have not used it personally, but it's supposed to be along the lines of a drop-in replacement for MySQL in-memory.

Various Portable versions of SQL, that will work with same efficiency, designed for mobile devices mainly.

SQLite

SQL Server Compact Edition

These are just big players other options may be there, but big players handle minimum requirements with release of it.. :)

and in memory database, you have continuously back up the data if fluctuation or powercut arises you may loss the whole bunch. as in other that will be handled as its in secondary memory(HDD) and the chances of loss will be 10% compare to memory DB.

I hope this may help :)

The most typical use-case for a database is persistence, which makes most in-memory databases unsuitable. One popular reason to use an in-memory database is for testing purposes. But this requires you use either a database that can be set up both as in-memory and something else.

Popular choices in this area seems to be RavenDB for .Net developers and OrientDB for Java developers. Because both can function as in-memory databases, and "something else" depending on configuration, so you can use one or the other depending on your configuration (app.config in .Net, Maven or Ant settings in Java).

Data processing needs are getting more complex and the product ecosystem is evolving to meet these new needs. Disk-based RDBMS, in-memory cache, and in-memory databases are used to satisfy different needs. You should go with what suits your need -

Traditional RDBMS: Your MySQL cluster is fast enough, easy enough to maintain, and you like having the reliability of ACID-compliance.

In-memory distributed cahce: Your application needs to do fast reads and writes without worrying too much about consistency or complex transactions.

In-memory RDBMS:

(Speed): Your application needs to process data/requests faster than your disk-based database can.
(Complexity): You need to make complex transactional reads and writes with joins and aggregations and like to use the power of SQL.
(Scalability): You need to scale your database horizontally without downtime.
(Maintainability): You need the database to provide high availability, replication, load-balancing and disaster recovery without adding to your maintenance chores.
(Caveat): Your data should fit in memory (typically in terabytes).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow