How do I keep the index of the duplicate element unchanged

https://stackoverflow.com/questions/11490393

20-06-2021
|

Question

Here is a input list:

['a', 'b', 'b', 'c', 'c', 'd']

The output I expect should be:

[[0, 'a'], [1, 'b'],  [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]

I try to use map()

>>> map(lambda (index, word): [index, word], enumerate([['a', 'b', 'b', 'c', 'c', 'd']])
[[0, 'a'], [1, 'b'], [2, 'b'], [3, 'c'], [4, 'c'], [5, 'd']]

How can I get the expected result?

EDIT: This is not a sorted list, the index of each element increase only when meet a new element

Solution

It sounds like you want to rank the terms based on a lexicographical ordering.

input = ['a', 'b', 'b', 'c', 'c', 'd']
mapping = { v:i for (i, v) in enumerate(sorted(set(input))) }
[ [mapping[v], v] for v in input ]

Note that this works for unsorted inputs as well.

If, as your amendment suggests, you want to number items based on order of first appearance, a different approach is in order. The following is short and sweet, albeit offensively hacky:

[ [d.setdefault(v, len(d)), v] for d in [{}] for v in input ]

OTHER TIPS

>>> import itertools
>>> seq = ['a', 'b', 'b', 'c', 'c', 'd']
>>> [[i, c] for i, (k, g) in enumerate(itertools.groupby(seq)) for c in g]
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]

[
    [i, x]
    for i, (value, group) in enumerate(itertools.groupby(['a', 'b', 'b', 'c', 'c', 'd']))
    for x in group
]

When list is sorted use groupby (see jamylak answer); when not, just iterate over the list and check if you've seen this letter already:

a = ['a', 'b', 'b', 'c', 'c', 'd']
result = []
d = {}
n = 0
for k in a:
  if k not in d:
     d[k] = n
     n += 1
  result.append([d[k],k])

It is the most effective solution; it takes only O(n) time.

Example of usage for unsorted lists:

[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd'], [0, 'a']]

As you can see, you have here the same order of items as in the input list.

When you sort the list first you need O(n*log(n)) additional time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow