Question

I have a dictionary that consists of words and their phonetic transcriptions. The words are all lower case, so there is not case-sensitive search involved.

The lexicon is really huge, and I need to load it quickly when my application starts. I would prefer reading it without having to read each entry separately.

I guess the way I stored and load it also affects how I keep the lexicon in memory

Thank you for any ideas.

No correct solution

OTHER TIPS

You probably want to store this as a Trie

This is an efficient way of storing a dictionary. Look at the following answers for more information

http://en.wikipedia.org/wiki/Trie

https://stackoverflow.com/questions/296618/what-is-the-most-common-use-of-the-trie-data-structure

Persisting a trie to a file - C

A few options come to mind:

  1. You could use sqlite, which uses mmap to map the file to memory, to store the lexicon so only what is accessed gets read. This is probably reasonable fast and reliable as well as the easiest to implement.
  2. You can mmap the file yourself
  3. Use seek operations to move the file pointer through the file without reading the whole thing. This will only help if the lexicon is structured in some way so you can find the right position without reading everything, i.e. it has to be a data structure that allows better than O(n) searching (a Trie usually being a good choice, as suggested by Salgar).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top