The dictation recognition engine in SAPI 5.1 simply can't recognize telephony quality (8 KHz 8 bit) audio, no matter what you do. It requires 22KHz 16 bit audio for any sort of reasonable error rate.
Even with that, the SAPI 5.1 dictation engine also requires speaker-dependent training, even for unaccented American English.
Even the latest SAPI 5.4 SR engines require 22KHz 16 bit audio, although they do work better without training for unaccented voices. (Accented voices still require training.)