Dedupe and sort a list in Python 2.2
-
09-02-2021 - |
Question
In Python 2.2 (don't ask), what's the neatest way to sort a list and remove duplicates?
I can obviously write a function that would sort()
then iterate, but am wondering if there's an idiomatic one-liner.
edit: The list is short, so efficiency is not a concern. Also, the elements are immutable.
Solution
For old python versions, and since you're using strings, there's no one-liner I can think of, but a pattern would probably be this, using dictionaries:
def sorted_uniq(your_list):
table = {}
for s in your_list:
table[s] = None
k = table.keys()
k.sort()
return k
Adapted from an ancient ActiveState code snippet thread that Alex Martelli himself wrote several comments on: http://code.activestate.com/recipes/52560/
A shorter way with list comprehensions:
def sort_uniq(alist):
d = {}
mod_list = [d.setdefault(i,i) for i in alist if i not in d]
mod_list.sort()
return mod_list
Aside from Steven's neat (yet slightly unattractive) one liner, I think this heads toward the fewest lines and most idiomatic way of doing it with Python 2.2:
Thanks to Steven Rumbalski in the comments, the 2nd version can be condensed further with python's zip
function:
def sort_uniq(alist):
mod_list = dict(zip(alist,alist)).keys()
mod_list.sort()
return mod_list
If list.sort()
didn't operate by side effect, we'd have a one liner. ;)
OTHER TIPS
Idiomatic and a one-liner? No.
Here's a non-idiomatic butt-ugly one-liner.
>>> x = [4, 3, 3, 2, 4, 1]
>>> [y for y in (locals().__setitem__('d',{}) or x.sort() or x)
if y not in d and (d.__setitem__(y, None) or True)]
[1, 2, 3, 4]
If a simple two-liner is acceptable:
x = [4, 3, 3, 2, 4, 1]
x = dict(map(None,x,[])).keys()
x.sort()
Or make two small helper functions (works for any sequence):
def unique(it):
return dict(map(None,it,[])).keys()
def sorted(it):
alist = [item for item in it]
alist.sort()
return alist
print sorted(unique([4, 3, 3, 2, 4, 1]))
gives
[1, 2, 3, 4]
And finally, a semipythonic one liner:
x = [4, 3, 3, 2, 4, 1]
x.sort() or [s for s, t in zip(x, x[1:] + [None]) if s != t]
For the record, Python 2.2 does have sets, but under the "sets" module, so this will get you a long way:
from sets import Set
myList = list(Set(myList))
# now we're duplicate-free, a standard sorting might be enough
myList.sort()
Probably the best answer is to use a binary tree:
# Make yield work in Python 2.2
from __future__ import generators
class TreeNode(object):
def __init__(self, value):
self.left = None
self.right = None
self.value = value
def add(self, value):
if value == self.value:
return
if value < self.value:
if self.left is None:
self.left = TreeNode(value)
else:
self.left.add(value)
else:
if self.right is None:
self.right = TreeNode(value)
else:
self.right.add(value)
def __iter__(self):
if self.left is not None:
for value in self.left:
yield value
yield self.value
if self.right is not None:
for value in self.right:
yield value
class DedupeSorter(object):
def __init__(self):
self.root = None
def add(self, value):
if self.root is None:
self.root = TreeNode(value)
else:
self.root.add(value)
def __iter__(self):
if self.root is None:
return []
else:
return self.root.__iter__()
def dedupe_and_sort(l):
sorter = DedupeSorter()
for value in l:
sorter.add(value)
return list(sorter)
Definitely not idiomatic, but should be fast. It basically creates a tree-based set and iterates over it. I don't have Python 2.2 so hopefully it works :p