I am about to index 10 million titles with their IDs(for now their line numbers), titles will be stored after tokenising them. The structure of the data has to be something like <String, Arraylist<Integer>>. Strings will represent the tokens, Integers will represent line numbers.

I have to build this tool using: Java, persistent memory, not using RDBMS as possible. As this data structure is mutable, I couldn't find any tools that support MultiMaps with the structure > to be indexed using BTree or any other persistent data structures.

I tried MapDB, but turned to only accept immutable, which in my case doesn't apply (Arraylist)

Any thoughts are appreciated.

有帮助吗?

解决方案

What you need is called MultiMap. MapDB does not support those directly, but has composite sets which are almost as good.

Example is here: https://github.com/jankotek/MapDB/blob/release-1.0/src/test/java/examples/MultiMap.java

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top