Question

I have an application which is receiving data for thousands (say 50,000) subjects.

Each data tuple comprises a subjectId and text data.

I am looking for an embeddable Java database which will provide me the following functionality:

  • Store the data quickly (thousands of tuples per second).
  • Provide lookup of the textual data for a subjectId.
  • (Provide efficient way to) delete data older than X days.
  • Must be embedded in Java process

Ideally the Berkeley JE database seems to meet my requirements, except its a key-value DB and my data is inherently multivalued. I'm not sure if this will be a performance issue with duplicate data.

What other embeddable options exist for this simple tuple schema?

Was it helpful?

Solution

If you want the db to index the data for you, you would want a document oriented database. If you only need to look up by key, you can serialize the data yourself using something like kyro or protocol buffers. If you can go with a schema, SQLite or Derby can be good solutions. OrientDB or Neo4j are graph databases that can be embedded. OrientDB is less mature, but has a better license. Cassandra is a Column oriented store that you can run embedded. LevelDB and Bitcask are database libraries. Both have good licenses, but would be ports from other languages and may not have the full feature set. LevelDB is a better choice if you can't have the full key set in memory.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top