Compare N lists and insert/append blank entries where discrepancies exist

https://stackoverflow.com/questions/22361454

13-06-2023
|

Question

I'm attempting handle several long lists (around 2000 entries in each). The goal is to sort of "diff" the lists by looking at them side by side and seeing where differences exist.

Example:

a = ['andrew', 'benjamin', 'franklin', 'erica', 'robert', 'jessica']
b = ['andrew', 'franklin', 'robert', 'delmar']
c = ['aaron', 'erica']

a.sort()
b.sort()
c.sort()

#do some operation that leaves us with:

a = [ '', 'andrew', 'benjamin', '', 'erica', 'franklin', 'jessica', 'robert']
b = ['', 'andrew', '', 'delmar', '', 'franklin', '', 'robert']
c = ['aaron', '', '', '', 'erica', '', '', '']

In the end you have three lists, such that if you were to print them out entry by entry you would have a sort of "diff" view, showing where discrepancies are between lists. Each list is the same length.

I'd like this to work with any input, and I've been working on something using numbers. To have it handle any amount of lists, i've been passing a list of lists to it. Here is my code so far:

def recursiveCompare(allLists, compareList):
    if compareList >= 0:
        for i in xrange(len(allLists[-1])):
            if (i <= (len(allLists[compareList]) - 1)):
                if (allLists[compareList][i] > allLists[-1][i]):
                    allLists[compareList].insert(i, "")
                if (allLists[compareList][i] < allLists[-1][i]):
                    allLists[-1].insert(i, "")
            else:
                if (allLists[compareList]) < (allLists[-1]):
                    allLists[compareList].append("")
                if (allLists[-1]) < (allLists[compareList]):
                    allLists[-1].append("")

        return recursiveCompare(allLists, compareList - 1)
    else:
        return allLists

c = [2, 3, 4, 5, 6]
b = [2, 3, 5]
a = [1, 2, 7]

allLists = [a, b, c]
allLists.sort(key=len)

allListsNew = recursiveCompare(allLists, len(allLists) - 1)

In the end, with this code, I get:

[[1, 2, 7], [2, 3, 5], [2, 3, 4, 5, 6]] -> [[1, 2, '', '', '', '', 7], [2, 3, '', 5], ['', 2, 3, 4, 5, 6, '']]

Which looks mostly correct, aside from the middle list. However, looks are deceiving and when run with only the first and last list, results aren't quite what I'd want.

I'm no python expert by any means, and I'm fairly sure there is not an efficient way to do this. But I'm definitely stuck and could use some eyes on it, or a pointer in the right direction if someone knows of a similar problem as this.

Thank you.

Solution

You can use sets to do the element checking efficiently:

c = [2, 3, 4, 5, 6]
b = [2, 3, 5]
a = [1, 2, 7]

allLists = [a, b, c]

allSets = list(map(set, allLists))
allTokens = set.union(*allSets)
sortedTokens = sorted(allTokens)

and transpose a list comprehension of the strings if they exist in their respective sets:

transposed = [tuple(map(lambda x: w if w in x else '', allSets))
              for t in sortedTokens]
result = zip(*transposed)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow