How to test for unique strings and duplicate strings (different case) using Python

https://stackoverflow.com/questions/21943816

14-10-2022
|

Pergunta

I have a problem I need to solve that is probably fairly easy for most of you, but has proved to be a little difficult for me as I've not done this kind of comparison before. The following is a the portion of the XML file I am parsing. I get a list of the strings back (the text of NAME) and I want to determine a couple things. First, I want to see if the names I got back are unique or not. Second, I want to know if there is a duplicate name (same name but different case). What would be the best way to tackle this? I don't expect the list to be too big. Here is my XML snippet and current code:

    <actions>
        <action>
            <name>Action_1</name>
        </action>
    <action>
            <name>action_1</name>
        </action>
        <action>
            <name>Action_2</name>
        </action>
    <action>
            <name>ACTION_2</name>
        </action>
    </actions>

    action = elementTree.findall('./actions/action')
    nameList = []

    # Get the list of actions and stuff them in a list for further comparison.
        for a in action:
            for child in a:
                if child.tag == 'name':
                    nameList.append(child.text)
                    print child.text

Output is as follows:

Action_1
action_1
Action_2
ACTION_2

So again, I just need to determine if the strings (name.text) I got back are unique or not. Second, I want to know if there is a duplicate name (same name but different case).

Solução

from collections import defaultdict, Counter
d1 = Counter()
d2 = defaultdict(set)
# count appearence of entries 
for x in nameList:
    d1[x] += 1
    d2[x.lower()].add(x)

# dupes are
for k,v in d1.iteritems():
    if v>1: print k

# different appearance of name
for k,v in d2.iteritems():
    if len(v) > 1: print k

If you have a …long… list, take a look at bloom filter.

Outras dicas

If your name list is case-insensitive, store the .lower() of it. Then you can easily use in to test for list membership:

if child.tag == 'name':
    text_lower = child.text.lower()
    if text_lower in nameList:
        print 'dupe!'
    else:
        nameList.append(text_lower)
    print child.text

list_names = ['Action_1', 'action_1', 'Action_2', 'ACTION_2']

list_names = [name.lower() for name in list_names]

name_counts = dict((name, list_names.count(name)) for name in set(list_names))

and name_counts returns:

{'action_2': 2, 'action_1': 2}

Alternately, you could use the collections.Counter, available in Python 2.7 forward.

import collections
name_counts = collections.Counter(list_names)

And name_counts returns a Counter object, which is a subclass of dict:

Counter({'action_1': 2, 'action_2': 2})

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow