Question

I have a list of lists ("sublists") and I want to see if the same sequence of any unspecified length occurs in more than one sublist. To clarify, the order of items must be preserved - I do not want the intersection of each sublist as a set. There must be at least 2 items that match sequentially. Please see example below.

Input:

someList = [[0,1,3,4,3,7,2],[2,3,4,3],[0,3,4,3,7,3]]

Desired Output: (will be printed to file but don't worry about this detail)

sublist0_sublist1 = [3,4,3] #intersection of 1st and 2nd sublists

sublist0_sublist2 = [3,4,3,7] #intersection of 1st and 3rd sublists

sublist1_sublist2 = [3,4,3] #intersection of 2nd and 3rd sublists

Was it helpful?

Solution

Whipped this up for you (including your comment that equal-length maximum sublists should all be returned in a list):

def sublists(list1, list2):
    subs = []
    for i in range(len(list1)-1):
        for j in range(len(list2)-1):
            if list1[i]==list2[j] and list1[i+1]==list2[j+1]:
                m = i+2
                n = j+2
                while m<len(list1) and n<len(list2) and list1[m]==list2[n]:
                    m += 1
                    n += 1
                subs.append(list1[i:m])
    return subs

def max_sublists(list1, list2):
    subls = sublists(list1, list2)
    if len(subls)==0:
        return []
    else:
        max_len = max(len(subl) for subl in subls)
        return [subl for subl in subls if len(subl)==max_len]

This works allright for these cases:

In [10]: max_sublists([0,1,3,4,3,7,2],[0,3,4,3,7,3])
Out[10]: [[3, 4, 3, 7]]
In [11]: max_sublists([0,1,2,3,0,1,3,5,2],[1,2,3,4,5,1,3,5,3,7,3])
Out[11]: [[1, 2, 3], [1, 3, 5]]

It's not pretty though, nor is it really fast.

You only have to figure out how to compare every sublist in your original list of sublists, but that should be easy.

[Edit: I fixed a bug and prevented your error from occurring.]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top