Question

I'm using Beautiful Soup in Python to scrape some data from HTML files. In some cases, Beautiful Soup returns lists that contain both string and NoneType objects. I'd like to filter out all the NoneType objects.

In Python, lists with containing NoneType objects are not iterable, so list comprehension isn't an option for this. Specifically, if I have a list lis containing NoneTypes, and I try to do something like [x for x in lis (some condition/function)], Python throws the error TypeError: argument of type 'NoneType' is not iterable.

As we've seen in other posts, it's straightforward to implement this functionality in a user-defined function. Here's my flavor of it:

def filterNoneType(lis):
    lis2 = []
    for l in links: #filter out NoneType
        if type(l) == str:
            lis2.append(l)
    return lis2

However, I'd love to use a built-in Python function for this if it exists. I always like to simplify my code when possible. Does Python have a built-in function that can remove NoneType objects from lists?

Was it helpful?

Solution

I think the cleanest way to do this would be:

#lis = some list with NoneType's
filter(None, lis)

OTHER TIPS

You can do this using list comprehension:

clean = [x for x in lis if x != None]

As pointed in the comments you could also use is not, even if it essentially compiles to the same bytecode:

clean = [x for x in lis if x is not None]

You could also used filter (note: this will also filter empty strings, if you want more control over what you filter you can pass a function instead of None):

clean = filter(None, lis)

There is always the itertools approach if you want more efficient looping, but these basic approaches should work for most day to day cases.

List comprehension, as other answers proposed or, for the sake of completeness:

clean = filter(lambda x: x is not None, lis)

If the list is huge, an iterator approach is superior:

from itertools import ifilter
clean = ifilter(lambda x: x is not None, lis)

For those who came here from Google — do not use this!

UPD 2021:
When this answer was written, the proposed implementation was absolutely valid in terms of language semantics yet being an obvious hack. Things has changed since then and starting from Python 3.9 evaluation of NotImplemented in boolean context is explicitly discouraged. Here an excerpt from Python docs:

Evaluating NotImplemented in a boolean context is deprecated. While it currently evaluates as true, it will emit a DeprecationWarning. It will raise a TypeError in a future version of Python.

I will keep this answer for the sake of history but please be aware that even in it's time this was kinda hacky. Stick to proposed list comprehension solutions or filter+lambda according to your requirements.

Original answer:
As of the beginning of 2019, Python has no built-in function for filtering None values which avoids common pitfals with deleting zeroes, empty strings, etc.

In Python3 you can implement this using .__ne__ dunder method (or 'magic method' if you will):

>>> list1 = [0, 'foo', '', 512, None, 0, 'bar']
>>> list(filter(None.__ne__, list1))
[0, 'foo', '', 512, 0, 'bar']

This is how it works:

  • None.__ne__(None) --> False

  • None.__ne__(anything) --> NotImplemented

NotImplemented exeption effectively is True, e.g.:

>>> bool(None.__ne__('Something'))
True

You could easily remove all NoneType objects from a list using a list comprehension:

lis = [i for i in lis if i is not None]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top