I'd use:
# mapping from soundex to correct word
soundex_to_ref = {jellyfish.soundex(w): w for w in ref_data}
for line in text1:
words = [soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()]
This produces a list of words for each line, with all words that match correctly-spelled words by soundex replaced by the correctly-spelled word.
The [... for .. in ...]
syntax is a list comprehension, it produces a new value for each item in the for
loop. So, for each word in line.split()
we produce the output of the soundex_to_ref.get(jellyfish.soundex(w), w)
expression in the output list.
The soundex_to_ref
object is a dictionary, generated from the ref_data
list; for each word in that list the dictionary has a key (the soundex value for that word), and the value is the original word. This lets us look up reference words easily for a given soundex.
dict.get()
lets you look up a key in a dictionary, and if it is not present, a default is returned. soundex_to_ref.get(jellyfish.soundex(w), w)
creates the soundex for the current word w
, looks up a reference word, and if the soundex is not present in the dictionary, the original word is replaced.
You can join the words
list back into a sentence by using:
line = ' '.join(words)
You can rebuild text1
in one expression with:
text1 = [' '.join([soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()])
for line in text1]