I have a small sorting problem. I'm comparing a couple of thousands of clustering solutions to find and count identical solutions. The problem I have is the following:

Given two clustering solutions:

l1 = [1,1,1,2,2,3,3,2,2]
l2 = [2,2,2,1,1,3,3,1,1]

These two solutions are identical, but the group labels are different. How do I sort l2 to make it look like l1? I want to do this to make hashing easier.

有帮助吗?

解决方案

I think I understand your problem, something like the function below would work, I think you'd have a hard time doing this in less than O(n) time.

def normalize_groups(in_list):
    groups = {}
    output = []
    for value in in_list:
        if value not in groups:
            groups[value] = len(groups)
        output.append(groups[value])
    return output

which would return for both lists:

In [52]: normalize_groups(l1)
Out[52]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

In [53]: normalize_groups(l2)
Out[53]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

EDIT: nevermind, just remove the string part completely.

其他提示

You should just be able to traverse L2 and create a mapping to L1 as you go.

mapping = {}
for i, (a, b) in enumerate(zip(l1, l2)):
  if b in mapping:
    l2[i] = mapping[b]
  else:
    mapping[b] = a
    l2[i] = a

 # Now l2 = [1, 1, 1, 2, 2, 3, 3, 2, 2].  Can compare for equality.

This is probably better referred to as label reassignment. It's relatively efficient -- it uses iterators, quick dictionary lookups, and is O(n).

It looks like your list shows which group each item belongs to, and you want a canonical group-labelling (ie ["A", "B", "C"] == ["X", "Y", "Z"]).

Note that this only works for exact match-lists.

from itertools import count

def canonicize(lst):
    top  = count(1)
    seen = {}
    out  = []
    for item in lst:
        try:
            out.append(seen[item])
        except KeyError:
            val = next(top)
            seen[item] = val
            out.append(val)
    return out

canonicize([1,1,1,2,2,3,3,2,2])   # =>  [1, 1, 1, 2, 2, 3, 3, 2, 2]
canonicize([2,2,2,1,1,3,3,1,1])   # =>  [1, 1, 1, 2, 2, 3, 3, 2, 2]

What I would do is simply get all indexes for each value:

>>> [[i for i, x in enumerate(l1) if x == j] for j in set(l1)]
[[0, 1, 2], [3, 4, 7, 8], [5, 6]]
>>> [[i for i, x in enumerate(l2) if x == j] for j in set(l2)]
[[0, 1, 2], [3, 4, 7, 8], [5, 6]]

So you simply have to compare the 2 new lists.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top