Question

I want to filter a list of all items containing the same last 4 digits, I want to print the longest of them.

For example:

lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
# want to return abcdabcd1234 and poiupoiupoiupoiu7890

In this case, we print the longer of the elements containing 1234, and the longer of the elements containing 7890. Finding the longest element containing a certain element is not hard, but doing it for all items in the list (different last four digits) efficiently seems difficult.

My attempt was to first identify all the different last 4 digits using list comprehension and slice:

ids=[]
for x in lst:
    ids.append(x[-4:])
ids = list(set(ids))

Next, I would search through the list by index, with a "max_length" variable and "current_id" to find the largest elements of each id. This is clearly very inefficient and was wondering what the best way to do this would be.

Was it helpful?

Solution

Use a dictionary:

>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> d = {} # to keep the longest items for digits.
>>> for item in lst:
...     key = item[-4:] # last 4 characters
...     d[key] = max(d.get(key, ''), item, key=len)
...
>>> d.values() # list(d.values()) in Python 3.x
['abcdabcd1234', 'poiupoiupoiupoiu7890']

OTHER TIPS

from collections import defaultdict
d = defaultdict(str)
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
for x in lst:
    if len(x) > len(d[x[-4:]]):
        d[x[-4:]] = x

To display the results:

for key, value in d.items():
    print key,'=', value

which produces:

1234 = abcdabcd1234
7890 = poiupoiupoiupoiu7890

itertools is great. Use groupby with a lambda to group the list into the same endings, and then from there it is easy:

>>> from itertools import groupby
>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> [max(y, key=len) for x, y in groupby(lst, lambda l: l[-4:])]
['abcdabcd1234', 'poiupoiupoiupoiu7890']

Slightly more generic

import string
import collections
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
z = [(x.translate(None, x.translate(None, string.digits)), x) for x in lst]
x = collections.defaultdict(list)
for a, b in z:
  x[a].append(b)

for k in x:
  print k, max(x[k], key=len)
1234 abcdabcd1234                                                               
7890 poiupoiupoiupoiu7890      
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top