Question

I've constructed a Trie with about 400,000 nodes from a dictionary having about 180,000 words. The problem is that the trie is taking too long to construct on my phone.

So I've decided to create the trie once and store it in some format to the disk, for fast recreation whenever needed. But I'm unable to come up with a good format to store it.

What would be the most efficient format to encode the trie, so that it can be reconstructed from the file as fast as possible?

Was it helpful?

Solution

If your Trie data structure implements serializable then writing to and from a file should be fairly straight forward. Java will take care of the file representation.

See this link.

OTHER TIPS

Maybe good idea - to keep tried in the memory buffer in the position-independent code, and read it into memory by mmap(). This is mostly quick way to work with trie from "cold start".

Also, maybe you can keep data not in the tries, but in the hashtable. By this method, you can keep in the memory only "bucket index", which is very small. And, when compute hash - pread() bucket into memory from file, and search in loaded part.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top