This simplifies the for
loop in the match_string
function, but didn't increase the speed noticeably in my tests. The biggest loss is in the two for
loops with lastnames and fullnames.
def match_strings(lastname, listofnames):
firstCaseMatched = [name for name in listofnames if lastname[0] == name[0]]
if len(firstCaseMatched):
matchedidx = [index for index, ame in enumerate(firstCaseMatched) if Levenshtein.distance(lastname, name) < 2]
match = len(matchedidx)
else:
match = 0
if match == 1:
newnamelist = [i for j, i in enumerate(listofnames) if j not in matchedidx]
return 1, newnamelist
return 0, listofnames
You might have to sort the list of known last names, split them into a dict
for each starting character. And then match each name in the list of names against that.
Assuming the fullnames list always has the first name as first element. You could limit the comparison to only the other elements.