Replacing word with same soundex

Question

I'd use:

# mapping from soundex to correct word
soundex_to_ref = {jellyfish.soundex(w): w for w in ref_data}

for line in text1:
    words = [soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()]

This produces a list of words for each line, with all words that match correctly-spelled words by soundex replaced by the correctly-spelled word.

The [... for .. in ...] syntax is a list comprehension, it produces a new value for each item in the for loop. So, for each word in line.split() we produce the output of the soundex_to_ref.get(jellyfish.soundex(w), w) expression in the output list.

The soundex_to_ref object is a dictionary, generated from the ref_data list; for each word in that list the dictionary has a key (the soundex value for that word), and the value is the original word. This lets us look up reference words easily for a given soundex.

dict.get() lets you look up a key in a dictionary, and if it is not present, a default is returned. soundex_to_ref.get(jellyfish.soundex(w), w) creates the soundex for the current word w, looks up a reference word, and if the soundex is not present in the dictionary, the original word is replaced.

You can join the words list back into a sentence by using:

line = ' '.join(words)

You can rebuild text1 in one expression with:

text1 = [' '.join([soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()])
         for line in text1]