Ordering a list of items in place to compare clustering solutions

Question 1

I think I understand your problem, something like the function below would work, I think you'd have a hard time doing this in less than O(n) time.

def normalize_groups(in_list):
    groups = {}
    output = []
    for value in in_list:
        if value not in groups:
            groups[value] = len(groups)
        output.append(groups[value])
    return output

which would return for both lists:

In [52]: normalize_groups(l1)
Out[52]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

In [53]: normalize_groups(l2)
Out[53]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

EDIT: nevermind, just remove the string part completely.

Question 2

You should just be able to traverse L2 and create a mapping to L1 as you go.

mapping = {}
for i, (a, b) in enumerate(zip(l1, l2)):
  if b in mapping:
    l2[i] = mapping[b]
  else:
    mapping[b] = a
    l2[i] = a

 # Now l2 = [1, 1, 1, 2, 2, 3, 3, 2, 2].  Can compare for equality.

This is probably better referred to as label reassignment. It's relatively efficient -- it uses iterators, quick dictionary lookups, and is O(n).

Question 3

It looks like your list shows which group each item belongs to, and you want a canonical group-labelling (ie ["A", "B", "C"] == ["X", "Y", "Z"]).

Note that this only works for exact match-lists.

from itertools import count

def canonicize(lst):
    top  = count(1)
    seen = {}
    out  = []
    for item in lst:
        try:
            out.append(seen[item])
        except KeyError:
            val = next(top)
            seen[item] = val
            out.append(val)
    return out

canonicize([1,1,1,2,2,3,3,2,2])   # =>  [1, 1, 1, 2, 2, 3, 3, 2, 2]
canonicize([2,2,2,1,1,3,3,1,1])   # =>  [1, 1, 1, 2, 2, 3, 3, 2, 2]

Question 4

What I would do is simply get all indexes for each value:

>>> [[i for i, x in enumerate(l1) if x == j] for j in set(l1)]
[[0, 1, 2], [3, 4, 7, 8], [5, 6]]
>>> [[i for i, x in enumerate(l2) if x == j] for j in set(l2)]
[[0, 1, 2], [3, 4, 7, 8], [5, 6]]

So you simply have to compare the 2 new lists.