i'm working on a biology software that generates some millions of strings (formed by nucleotide bases, A-G-C-T) of length usually bigger than 30 chars. It was written C.
I need a database to store this data on disk fast enough to don't create a bottleneck slowing the entire software and without consume too much RAM. Moreover, i need it to be completely linked inside my application. I don't want to force my users to install a SQL server or something like that.
I already tried hamsterDB, SQLite, Kyoto Cabinet and MapDB without success. The problem is that i need to insert or update data from database at ~50k operations/sec at least. With some optimizations i got SQLite to be the faster. It reachs 18k operations/sec (it uses synchronization off, journal_mode off, transactions, ignore_check_constraints on, cache_size of 500.000 and pre-compiled statements).
Each sequence is classified as A or B, and i need to know how many i have of each kind. Right now i'm using the sequence as a key and adding a counter for A types and another for B types. On SQLite databases i'm using columns and commands like these:
INSERT OR REPLACE INTO events (main_seq,qnt_A,qnt_B) VALUES (@SEQ,COALESCE((SELECT qnt_A FROM events WHERE main_seq=@SEQ)+1,1),(SELECT qnt_B FROM events WHERE main_seq=@SEQ))
This is slower than a simple INSERT INTO, but if the seq already exists on DB i need to just increment one of the columns.
With Kyoto Cabinet i got a really high speed, but it only supports string records and i need to add and update integers to count how many A and B i have.
Do anyone knows another good DB that may satisfy my needs on write speed and flexibility of records?