Question

I have a dictionary that consists of words and their phonetic transcriptions. The words are all lower case, so there is not case-sensitive search involved.

The lexicon is really huge, and I need to load it quickly when my application starts. I would prefer reading it without having to read each entry separately.

I guess the way I stored and load it also affects how I keep the lexicon in memory

Thank you for any ideas.

Pas de solution correcte

Autres conseils

You probably want to store this as a Trie

This is an efficient way of storing a dictionary. Look at the following answers for more information

http://en.wikipedia.org/wiki/Trie

https://stackoverflow.com/questions/296618/what-is-the-most-common-use-of-the-trie-data-structure

Persisting a trie to a file - C

A few options come to mind:

  1. You could use sqlite, which uses mmap to map the file to memory, to store the lexicon so only what is accessed gets read. This is probably reasonable fast and reliable as well as the easiest to implement.
  2. You can mmap the file yourself
  3. Use seek operations to move the file pointer through the file without reading the whole thing. This will only help if the lexicon is structured in some way so you can find the right position without reading everything, i.e. it has to be a data structure that allows better than O(n) searching (a Trie usually being a good choice, as suggested by Salgar).
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top