Вопрос

I’ve got a simple bit of code that matches network usernames found in two text files. I’ve tried to normalize the input by changing both inputs to uppercase, but I need to take it a stage further and get my code to produce partial matches of usernames. I may have SMITH, JOHN in one list, and perhaps, SMITH,JOHN (FINANCE) in another list. I’ve looked at FuzzyWuzzy but I’ve only been learning Python for a couple of weeks, and I’m struggling to understand how to use it in my script.

with OpenUpperCase(filename, "r")as file1:
    for line in islice(file1,20,None):
        with OpenUpperCase ("c:\\Files\Usernames.txt", "r") as file2:
            files= filename.upper().split("\\")
            int1=files[3].strip()
            filedate=int1[0:-4]
            list2 = file2.readlines()
            for i in file1:
                for j in list2:
                    if i == j:

This is what I have so far, not great coding probably, but it seems to work. Any thoughts as to how I can get a fuzzy match of my usernames please? Many thanks for any help you could provide.

EDIT.

Typically, my lists will look like this. Obviously with hundreds of users.

File1

Salt, William (old user)
Wilds, Tony
Smith, William (Old User)
JONES,Steven (Old User)

File2

Salt, Bill
Wilds, Tony (SALES)
Smith,Will (OLD USER)
JONES,STEVEN (ACCOUNTS)
Это было полезно?

Решение

Using stringIO objects for simplicity:

file1 = io.StringIO("""Salt, William (old user)
Wilds, Tony
Smith, William (Old User)
JONES,Steven (Old User)""")

file2 = io.StringIO("""Salt, Bill
Wilds, Tony (SALES)
Smith,Will (OLD USER)
JONES,STEVEN (ACCOUNTS)""")

Read all names into a set:

def read_file(fobj):
    names = set()
    for line in fobj:
        split_line = line.lower().split(',')
        names.add((split_line[0], split_line[1].split()[0]))
    return names

For each file:

data1 = read_file(file1)
data2 = read_file(file2)

A simple intersection will do:

data1.intersection(data2)

Result:

set([('wilds', 'tony'), ('jones', 'steven')])
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top