Question

Before anyone mentions it, I've been all over stackoverflow and Google to find an answer to this and I believe that I may just be doing it wrong.

I'm scrapping an xml document and placing values into variables using BeautifulSoup4. Right now I'm reading the values scrapped into a dictionary and iterating through the dictionary to find the values that I need, however, when I want to print these values into a report with a template I get the following error:

TypeError: coercing to Unicode: need string or buffer, NoneType found

Which I've found is the result of having a None value as one of the values in my dictionary. Solution: I've been trying t ofind a way to iterate over my dictionary in python 2.7 to remove or replace the NoneType values but nothing seems to work. some solutions I've found have been the filter(None, list) function, the for k, v in dictionary: if v in not None: list.append(item), clean = [x for x in list if x != None], using lambda, and so on. None of them seem to work which makes me believe I must be doing something wrong. For example, this is how I set up my dictionary:

itemDict = []

  for item in soup3.find_all('XMLTag'):
    r = {
        'definition1': item.Starttag.string,
        'definition2': item.Stoptag.string,
        'definition3': item.Filltag.string,
        'definition4': item.Stoptag2.string,

    }
    itemDict.append(r)

but moving through it to get rid of or replace the NoneTypes in itemDict has been a pain. The end result I was planning was placing the items in the dictionary into a piece of template code to be printed as a report so for example """<Description>"""+item[0]['definition4']+"""</Description>""". Any thoughts?

EDIT:

the solution was actually really simple thanks to Martijn Pieters and Steve Jessop.

itemDic = []

for newdic in soup3.find_all("XMLTag"):
s = {
    'definition1': newdic.Order.string,
    'definition1': newdic.Code.string,
    'definition1': newdic.Description.string,
    }

for k in s:
    if s[k] is None:
        s[k] = ''

itemDic.append(s)

This replaced all NoneTypes that came across from the XML scrape using BeautifulSoup4 with empty strings. Similarly, the code above can also substitute any other value a user would want for a given condition. So for example If I wanted to change every instance of 'fabulous' to 'it was just okay', I would replace 'None' with 'fabulous' and the empty string, '', with 'it was just okay' and viola! Thanks again you guys.

Was it helpful?

Solution 2

r = {
    'definition1': item.Starttag.string,
    'definition2': item.Stoptag.string,
    'definition3': item.Filltag.string,
    'definition4': item.Stoptag2.string,
}

new_r = dict((k, v) for k, v in r.iteritems() if v is not None)

But it looks as if you're later going to write new_r['definition4'], so removing the keys with None entries will just change the exception to a different one. You should make an actual decision what you want to do about the missing data ;-)

OTHER TIPS

Why not create a dictionary without None values in the first place?

tags = ('Starttag', 'Stoptag', 'Filltag', 'Stoptag2')

for item in soup3.find_all('XMLTag'):
    r = {}
    for i, tag in enumerate(tags, 1):
        value = getattr(item, tag).string
        if value is not None:
            r['definition' + str(i)] = value
    itemDict.append(r)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top