Question

How can I filter a list based on another list which contains partial values and wildcards? The following example is what I have so far:

l1 = ['test1', 'test2', 'test3', 'test4', 'test5']
l2 = set(['*t1*', '*t4*'])

filtered = [x for x in l1 if x not in l2]
print filtered

This example results in:

['test1', 'test2', 'test3', 'test4', 'test5']

However, I am looking to limit the results based on l2 to the following:

['test2', 'test3', 'test5']
Was it helpful?

Solution

Use the fnmatch module and a list comprehension with any():

>>> from fnmatch import fnmatch
>>> l1 = ['test1', 'test2', 'test3', 'test4', 'test5']
>>> l2 = set(['*t1*', '*t4*'])
>>> [x for x in l1 if not any(fnmatch(x, p) for p in l2)]
['test2', 'test3', 'test5']

OTHER TIPS

you can also use filter() instead of the list comprehension, which may have the advantage that you can easily swap your filter function for more flexibility:

>>> l1 = ['test1', 'test2', 'test3', 'test4', 'test5']
>>> l2 = set(['*t1*', '*t4*'])
>>> filterfunc = lambda item: not any(fnmatch(item, pattern) for pattern in l2)
>>> filter(filterfunc, l1)
Out: ['test2', 'test3', 'test5']
>>> # now we don't like our filter function no more, we assume that our l2 set should match on any partial match so we can get rid of the star signs:
>>> l2 = set(['t1', 't4'])
>>> filterfunc = lambda item: not any(pattern in item for pattern in l2)
>>> filter(filterfunc, l1)
Out: ['test2', 'test3', 'test5']

This way, you can even generalize your filterfunc to work with several pattern sets:

>>> from functools import partial
>>> def filterfunc(item, patterns):
    return not any(pattern in item for pattern in patterns)
>>> filter(partial(filterfunc, patterns=l2), l1)
Out: ['test2', 'test3', 'test5']
>>> filter(partial(filterfunc, patterns={'t1','test5'}), l1)
Out: ['test2', 'test3', 'test4']

And of course you could easily upgrade your filterfunc to accept regular expressions in the pattern set, for example.

I think the simplest approach for your use-case is to simply test for the substring using Python's in (although this means removing your asterisks):

def remove_if_not_substring(l1, l2):
    return [i for i in l1 if not any(j in i for j in l2)]

so here's our data:

l1 = ['test1', 'test2', 'test3', 'test4', 'test5']
l2 = set(['t1', 't4'])

And calling our function with it:

remove_if_not_substring(l1, l2)

returns:

['test2', 'test3', 'test5']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top