Finding equality between different strings that should be equal

Question 1

You could have some kind of equivalence mapping:

equivalents = {"Arsenal": ["ARS",], 
               "Manchester United": ["MNU", "ManUtd"], ...}

And use this to process your data:

>>> name = "ManUtd"
>>> for main, equivs in equivalents.items():
    if name == main or name in equivs:
        name = main
        break

>>> name 
"Manchester United"

This allows you to easily see what you consider to be the "canonical name" for the team (i.e. the key) and other names that are considered to be the same team (i.e. the list value).

If you do go down the class route, you should make the list of team tuples a class attribute:

class Team:

    TEAMS = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]

    def __init__(self, name):
        if not any(name in names for names in self.TEAMS):
            raise ValueError("Not a valid team name.")
        self.name = name

    def __eq__(self, other):
        for names in self.TEAMS:
            if self.name in names and other.name in names:
                return True
        return False

The output from this:

>>> mnu1 = Team("ManUtd")
>>> mnu2 = Team("MNU")
>>> mnu1 == mnu2
True

>>> ars = Team("ARS")
>>> ars == mnu1
False

>>> fail = Team("Not a name")
Traceback (most recent call last):
  File "<pyshell#49>", line 1, in <module>
    fail = Team("Not a name")
  File "<pyshell#43>", line 7, in __init__
    raise ValueError("Not a valid team name.")
ValueError: Not a valid team name.

Alternatively, just a simple function would do the same job if your Team won't have other attributes:

def equivalent(team1, team2):
    teams = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]
    for names in teams:
        if team1 in names and team2 in names:
            return True
    return False

Output from this:

>>> equivalent("MNU", "ManUtd")
True
>>> equivalent("MNU", "Arsenal")
False
>>> equivalent("MNU", "Not a name")
False

Question 2

For simplicity, you can just put everything in a flat canonical lookup:

canonical = {'Arsenal':'ARS',
             'ARS':'ARS',
             'Manchester United':'MNU',
             'MNU':'MNU',
             'ManUtd':'MNU',
             ...}

Then equivalence testing is easy:

if canonical[x] == canonical[y]:
    #they're the same team

There are a lot of good alternative answers here, so broad picture: this approach is good if you never expect your canonical lookup to change. You can generate it once then forget about it. If it does frequently change, this is going to be miserable to maintain, so you should look elsewhere.

Question 3

The code of roippi can be made better maintainable if you define a function which inverts the dictionary:

def invertdict(d):
  id=dict()
  for (key,value) in d.items():
    for part in value:
      if part in id:
        id[part]=id[part]+(key,)
      else:
          id[part]=(key,)
  return id

If you do it this way, the values of canonical have to be defined as tuples:

canonical = {'Arsenal':('ARS',),
             'ARS':('ARS',),
             'Manchester United':('MNU',),
             'MNU':('MNU',),
             'ManUtd':('MNU',)}

but then you can simply:

print invertdict(canonical)
{'ARS': ('ARS', 'Arsenal'), 'MNU': ('ManUtd', 'Manchester United', 'MNU')}
print invertdict(invertdict(canonical))
{'MNU': ('MNU',), 'Manchester United': ('MNU',), 'ARS': ('ARS',), 'Arsenal': ('ARS',), 'ManUtd': ('MNU',)}
# this is canonical again

Then you maybe want to define the inverted canonical in the beginning and use invertdict to get canonical and be able to compare your teams

hope it helps

Question 4

What I would do:

class Team:
    def __init__(self, name, all_names):
        self.name = name  # use name as it's "proper" name
        self.all_names = all_names # use a list of all acceptable names and abbreviaitons

man = Team('Manchester United',['Manchester United', 'MNU', 'ManUtd'])

You could then use if 'MNU' in man.all_names

Question 5

I think the best way to do it is close to what you have, using a list of tuples of all the correlated names.

def __eq__(self, teamA, teamB):
    for names in self.teams:
        if teamA in names:  break

    if (teamA and teamB) in names: #Must include teamA in this comparison to avoid false positive from last entry of self.teams containing teamB but not teamA
         return True
    else:
         return False

This has the advantage over using a dict or list of abbreviations because it doesn't matter which name version is used as the "key"

You could attempt to automate the matching with something like this:

def __eq__(self, teamA, teamB):
    if len(teamA) > len(teamB):
        return all([l in teamA.lower() for l in teamB.lower()])
    elif len(teamA) < len(teamB):
        return all([l in teamB.lower() for l in teamA.lower()])
    else:
        return teamA.lower() == teamB.lower()

Note that this method won't be perfect since it requires all the letters of the abbreviation to be in the full version (wwhich may not always be the case). You could build a more sophisticated matching scheme than what I have here which will get more reliable results