You have non-unique labels; you can use a defaultdict
to generate numbers on first access, combined with a counter:
from collections import defaultdict
from itertools import count
from functools import partial
label_to_number = defaultdict(partial(next, count(1)))
[(label_to_number[label], label) for label in labels]
This generates a count in order of the labels first occurrence in labels
.
Demo:
>>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
>>> label_to_number = defaultdict(partial(next, count(1)))
>>> [(label_to_number[label], label) for label in labels]
[(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]
Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels
list.
Alternatively, use a set()
to get unique values, then map these to a enumerate()
count:
label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
[(label_to_number[label], label) for label in labels]
This assigns numbers more arbitrarily, as set()
objects are not ordered:
>>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
>>> [(label_to_number[label], label) for label in labels]
[(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]
This requires looping through labels
twice though.
Neither approach requires you to first define a dictionary of labels; the mapping is created automatically.