Removing duplicates from the list of unicode strings

Question 1

You actually don't have duplicates in your list. One time you have http://catb.org while another time you have http://www.catb.org.

You'll have to figure a way to determine whether the URL has www. in front or not.

Question 2

You can create a new list and add items to it if they're not already in it.

result = [ /some list items/]
uniq = []
for item in result:
    if item not in uniq:
        uniq.append(item)

Question 3

You could use a set and then sort it by the original index:

sorted(set(result), key=result.index)

This works because index returns the first occurrence (so it keeps them in order according to first appearance in the original list)

I also notice that one of the strings in your original isn't a unicode string. So you might want to do something like:

u = [unicode(s) for s in result]
return sorted(set(u), key=u.index)

EDIT: 'http://google.com' and 'http://www.google.com' are not string duplicates. If you want to treat them as such, you could do something like:

def remove_www(s):
    s = unicode(s)
    prefix = u'http://'
    suffix = s[11:] if s.startswith(u'http://www') else s[7:]
    return prefix+suffix

And then replace the earlier code with

u = [remove_www(s) for s in result]
return sorted(set(u), key=u.index)

Question 4

Here is a method that modifies result in place:

result = [u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', 'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
seen = set()
i = 0
while i < len(result):
    if result[i] not in seen:
        seen.add(result[i])
        i += 1
    else:
        del result[i]