Question

I used the python package ahocorasick(https://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/) for text matching for the state name here:

import ahocorasick
states = {
    'AK': 'Alaska',
    'AL': 'Alabama',
    'AR': 'Arkansas',
    'AS': 'American Samoa',
    'AZ': 'Arizona',
    'CA': 'California',
    'CO': 'Colorado',
    'CT': 'Connecticut'
}

def LoadKeywords(keywords):
    #Keyword should be a list
    tree = ahocorasick.KeywordTree()
    for k in keywords:
        tree.add(k)
    tree.make()
    return tree

keywordLong = states.values();
keywordLongTree = LoadKeywords(keywordLong);

Then I try to do search

keywordLongTree.search("Alabama")

it returns

(0, 7)

Which is fine and legitimate, but when I do

keywordLongTree.search("I don't know why this happen")

it should returns a NONE object but it returns:

(145331, 145335)

Has someone faces this situation before? why this happen?

Was it helpful?

Solution

I encountered exactly the same issue. It should be the defect of the module. After all, it hasn't been modified since 2005. I used https://code.google.com/p/esmre/ instead. It worked find. Give it a trial!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top