With a set of 500 words in a single choice, you're likely running into limits of the recognizer to correctly disambiguate between the various choices.
Without going into a ton of detail, speech recognizers work by matching phoneme sequences to possible words.
If a sequence is considered to be "too unlikely", it is pruned; if an audio sequence comes in that doesn't result in a sufficiently high confidence, then the entire recognition is rejected (and a "false recognition" event is generated).
The wider the spread of words, the lower the individual confidences can be (and, indeed, the narrower the spread of words, then the confidences can be unjustifiably high, resulting in false positive recognitions).
With this large set of words, you're likely going to need some sort of dictation recognition model (which can be implemented, but it's a lot more complicated).