How efficient will be to use a in memory database to store millions of temporary values?

https://stackoverflow.com/questions/3936044

30-09-2019
|

Question

My application currently stores millions of Double elements for a calculation. These values are only temporary values before they are used for a specific algorithm that is run at the end of the calculation. Once this calculation is done, the millions of values can be discarded.

The full story is here, if you need more details.

One of the solutions that was proposed is to use an in-memory database.

So if I go with this solution, I will use this database to store my values in a table to replace my current Map<String, List<Double>>, like:

create table CALCULATION_RESULTS_XXX (
  deal_id varchar2,
  values number
);

(one table per calculation, XXX is the calculation ID)

So during the calculation, I will do the following:

When the calculation is started, I create the CALCULATION_RESULTS_XXX table.
Every time I need to add a value, I insert a record in this table.
At the end of the calculation, I use the table content for my algorithm.
Finally, I drop this table.

As explained in the other subject, currently, my calculation may store several hundreds of Mb of data in the memory, as a list of 30 * 1,000,000 of Double will need about 240Mb.

The questions now:

If I go with an in-memory database, does my memory consomption will be decreased?
What are the specific points that I will have to take care regarding the database usage (or table creation), the data insertion, etc. ?
I think I will choose H2 database. Do you think it's the best choice for my needs?

Solution

The problem is sufficiently simple that you really need to just give it a go and see how the (performance) results work out.

You already have an implementation that just uses simple in-memory structures. Personally, given that even the cheapest computer from Dell comes with 1GB+ of RAM, you might as well stick with that. That aside, it should be fairly simple to wack in a database or two. I'd consider Sleepycat Berkerly DB (Which is now owned by Oracle...), because you don't need to use SQL and they should be quite efficient. (They do support Java).

If the results are promising, I'd then consider further investigation, but this really should only take a few days work, at most, including the benchmarking.

OTHER TIPS

A simple HashMap backed up by Terracotta would do better and will allow to store collection bigger then JVM virtual memory.

Embedded databases, especially, the SQL-based ones, will add complexity and overhead to your code, so it doesn't worth it. If you really need a persistent storage with random access, try one of nosql DBs, like CouchDB, Cassandra, neo4j

I don't know whether it will be faster, so you'd have to try it. What I do want to recommend is to do batch inserts of an entire list when you don't immediately need that list anymore. Don't save value by value :)

If you're end algorithm can be expressed in SQL it might also be worth your while to do that, and not load all Lists back in. In any case, don't put anything like an index or constraint on the values, and preferably also not allow NULL (if possible). Maintaining indices and constraints cost time, and allowing NULL can also cost time, or create overhead. deal_ids can (and are) of course indexed as they're primary keys.

This isn't very much but at least better than a single down-voted answer :)

There really is no reason at all to add an external component to make your program run slower. Compress the data block and write it to file if you need to handle more than the internal memory available. A workstation now takes 192GB of ram so you can't afford to waste much time on it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow