Python Beautiful Soup - function

Question

Okay firstly you have created a urllib2.opener in your code and then you call the webpage by using urllib2.urlopen().....so you arent even using your opener or any of the extra items you went through the trouble of creating. Also with a username and password being specified in your code I'm assuming you'll be logging into a website at some point. if thats the case then you'll also be in a world of hurt without cookie handling. I've reorganized a bit of your code and think that the following should be a polished starting off point for you.

Also, here is a walkthrough of the function performing the operations that you that you specified...

searches an entire beautifulsoup object for an unordered list with a class of dfwp-column dfwp-list
td variable = all 'td' tags in that match
tr variable = all 'tr' tags in that same match
even though you haven't done anything with those two variables yet....you destroy them by creating a loop that uses those same variable names, overwriting the values meaning they meant absolutely nothing...
for every table with that classname: (hint there is only 1 table defined and in that format the "for tr in td" does absolutely nothing) print the clean of the result....

it doesn't do what it looks like it does.

to avoid this.... the new function with those operations you specified...

def myfunction(b):
    """param is a soup instance"""
    table=b.find('ul', {'class':'dfwp-column dfwp-list'})
    for td in table.findAll('td'):
        for tr in td.findAll('tr'):
            print bleach.clean(tr,tags=[], strip=True)

much less code....and this way it finds the correct data and iterates correctly. like so:

table is the unordered list with 'dfwp-column dfwp-list' class
it prints the bleach operation on every 'tr' tag found in every 'td' tag found in the table

Just trying to be helpful...I've cleaned up and reordered your code to eliminate some waste and added the things already mentioned. Try this for now:

from ntlm import HTTPNtlmAuthHandler
from bs4 import BeautifulSoup
import requests, os, bleach, urllib2, cookielib

user='XXX'
password='XXX'
url='URL'

cookies = cookielib.CookieJar()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman))

pagedata=opener.open(url)
soup=BeautifulSoup(pagedata)