Question

i have a list of sentences with wrongly spelled prepositions. i have a list of correctly spelled preps:

ref_data = ['near','opposite','off','towards','behind','ahead','below','above','under','over','in','inside','outside']

i need to compute the soundex of words from my data and substitute it with my reference word if the soundex matches.. heres my code:

for line in text1:
for word in line.split():
    if jellyfish.soundex(word)==jellyfish.soundex([words,int in enumerate(ref_data)])
       word = #replace code here

i am really confused .. text1 contains sentences like ['he was nr the fountain',...many more]. please help.. my syntax is wrong ..

Was it helpful?

Solution

I'd use:

# mapping from soundex to correct word
soundex_to_ref = {jellyfish.soundex(w): w for w in ref_data}

for line in text1:
    words = [soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()]

This produces a list of words for each line, with all words that match correctly-spelled words by soundex replaced by the correctly-spelled word.

The [... for .. in ...] syntax is a list comprehension, it produces a new value for each item in the for loop. So, for each word in line.split() we produce the output of the soundex_to_ref.get(jellyfish.soundex(w), w) expression in the output list.

The soundex_to_ref object is a dictionary, generated from the ref_data list; for each word in that list the dictionary has a key (the soundex value for that word), and the value is the original word. This lets us look up reference words easily for a given soundex.

dict.get() lets you look up a key in a dictionary, and if it is not present, a default is returned. soundex_to_ref.get(jellyfish.soundex(w), w) creates the soundex for the current word w, looks up a reference word, and if the soundex is not present in the dictionary, the original word is replaced.

You can join the words list back into a sentence by using:

line = ' '.join(words)

You can rebuild text1 in one expression with:

text1 = [' '.join([soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()])
         for line in text1]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top