Question

I have two lists:

wordlist =  ['A', 'Aani', 'Aaron', 'Aaronic',
             'Aaronical', 'Aaronite', 'Aaronitic',
             'Aaru', 'Ab', 'Ababdeh']

and

wordlist_compound = [['A','0'], ['Aaronic','1'], ['Key','2'],
                     ['Aaronical','3'], ['Aaronite','4'], ['Yes','5']]

I would like to take the intersection of the two words and make a list that contains the word, number combination number in a third list, wordlist_final, so that wordlist_final looks like:

[['A','0'], ['Aaronic','1'], ['Aaronical','3'], ['Aaronite','4']]

My current code looks like:

wordlist_final = []
for index, word in enumerate(wordlist):
    for word_comp in wordlist_compound:
        if word[index] == wordlist_compound[index][0]:
            wordlist_final.append(wordlist_compound[index])

But I'm getting a "string index out of range error"

Was it helpful?

Solution

Your output is easily accomplished using a list comprehension:

wl=['A', 'Aani', 'Aaron', 'Aaronic', 'Aaronical', 'Aaronite', 'Aaronitic', 'Aaru', 'Ab', 'Ababdeh']
wlc=[['A','0'], ['Aaronic','1'], ['Key','2'], ['Aaronical','3'], ['Aaronite','4'], ['Yes','5']]

print [[word, i] for word,i in wlc if word in wl]    
# [['A', '0'], ['Aaronic', '1'], ['Aaronical', '3'], ['Aaronite', '4']]

Alternative LC:

print [li for li in wlc if li[0] in wl]   

If you want a looping structure:

wlf = []
for word, i in wlc:
    if word in wl:
        wlf.append([word,i])

print wlf       
# [['A', '0'], ['Aaronic', '1'], ['Aaronical', '3'], ['Aaronite', '4']]

Python sequences usually don't need to be enumerated to just deal with the objects in the sequence. You usually only need to use enumerate if there is something about the index or order that is 'data' in addition to the sequence itself.

Here you are taking each element in wordlist_compound and testing membership of the word in wordlist. No need for enumeration. You also greatly simplify the task if you reverse the loops; loop over wordlist_compound rather than looping over wordlist in the outer loop as you have it. Your output is a filter of the elements in wordlist_compound; which, of course, means you can use filter too:

print filter(lambda li: li[0] in wl, wlc)
# [['A', '0'], ['Aaronic', '1'], ['Aaronical', '3'], ['Aaronite', '4']]

Cheers.

OTHER TIPS

The problem is that len(wordlist) > len(wordlist_compound), so using the index of wordlist to index wordlist_compound will give index out of bound errors.

Also, as @aga mentioned, should be if word == wordlist_compound[index][0].

if word[index] == wordlist_compound[index][0]:

I believe it has to be

if word == wordlist_compound[index][0]:

You're getting this exception on the element 'Aaru': its index is 7, and 'Aaru'[7] doesn't exist.

But this observation won't help you, because your loop contains some logical errors. I would rewrite it like so:

for inner_list in wordlist_compound: 
    if inner_list[0] in wordlist: 
        wordlist_final.append(inner_list) 

Or use list comprehension, as dawg have shown.

Depending on the size of the two collections, I'd probably do it like this:

word_numbers = dict(wordlist_compound)
wordlist_final = [(word, word_numbers[word]) for word in wordlist if word in word_numbers]

If you don't care about the order of the result (or if both lists are in the same order, in this case alphabetical) then you could instead do:

words = set(wordlist)
wordlist_final = [p for p in wordlist_compound if p[0] in words]

That would be the better option if wordlist_compound is likely to be significantly bigger than wordlist.

I've just noticed, in both cases I returned a list of tuples whereas you have a list of lists. You can fix that if necessary by changing the () to [] in my first block of code, or by changing p for p to list(p) for p in the second.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top