Question

In my use case the data is relatively small (~1000.000 Strings), but i have to access as fast as possible (every nano sec counts), from a multithreaded environment (implemented in pure Java) Currently I'm using redis (in localhost) and I'm basically happy with it, but i want to know if there is some better alternative, since redis has all the network stuff, and is not designed for multithred stuff. The persistence is also very low priority for my use case.

  • I want to run in the same machine (no networking at all)
  • I want to be as fast as possible
  • Relativity small data (my current Redis instance is about 20MB max in memory)

i don't want to :

  • use other solution than NoSql database.
Was it helpful?

Solution

There are lots of great NoSQL databases that function as a key-value store. Each have unique capabilities.

Redis is great in a single server and is dead easy to install and use. But Redis becomes difficult to shard and manage when your data outgrows beyond a single server.

Thumbtack Technologies (of NYC) published two white papers comparing performance and reliability of MongoDB, Cassandra and Aerospike. The papers are very objective, the benchmarks where done using the YCSB benchmarking tool and were conducted on the same hardware.

Which one to used depends on what you need. MongoDB is a feature rich key-value store with lots of nice programmer features. It offers queries on secondary indexes and is a very good document store. It's a In-memory database so all you data must fit into RAM. Mongo can be clustered and I have heard that it becomes tricky to manage if you have a big cluster.

CouchBase is great for storing large amounts of data and a portion of that data is cached in RAM. So its very quick if the value you are after is in the cache working set. This is great if your use case mostly works with hot data and accesses cold data less often.

Cassandra is really good for a 'write heavy' use case. Its easy to use and is a good programmer experience. It is written in Java and periodically pauses while it does GC, so you need to tune you GC parameters.

Aerospike is good for storing large amounts of data in a small number of servers. It boasts single digit millisecond (or better) latencies, high availability and high reliability, and it is probably (IMHO) the easiest to maintain and scale. It is multi-code aware, NUMA node aware and has a self-healing zero touch cluster technology. It's great for "real-time" use cases where access to any record needs to be fast and predictable. Aerospike is my favorite.

Cassandra, CouchBase, MongoDB and Aerospike all have an "analytics" capability, and which one you choose depends on the use case and your performance envelope.

OTHER TIPS

You have 1 million strings?

That's a tiny amount of data. If you want speed than nothing will be faster than just using in-memory data structure inside your application code itself. Just store all the data in a file, load up into a list on program startup then serialize back to the file if you need to save it.

Avoid all the overhead of running and interacting with a database - especially you don't care about persistence.

A simple flat file with each line being a separate string will take about 100ms to read and parse.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top