Question

I have half-written a code to pull the titles and links from an RSS feed but it results in the above error. The error is in both the functions while getting the text. I want to strip the entered string of the title and link tags.

from bs4 import BeautifulSoup
import urllib.request
import re

def getlink(a):
    a= str(a)
    bsoup=BeautifulSoup(a)
    a=bsoup.find('link').getText()
    return a

def gettitle(b):
    b=str(b)
    bsoup=BeautifulSoup(b)
    b=bsoup.find('title').getText()
    return b

webpage= urllib.request.urlopen("http://feeds.feedburner.com/JohnnyWebber?format=xml").read()

soup=BeautifulSoup(webpage)
titlesoup=soup.findAll('title')
linksoup= soup.findAll('link')

for i,j in zip(titlesoup,linksoup):
    i = getlink(i)
    j= gettitle(j)

    print (i)
    print(j)
    print ("\n")

EDIT: falsetru's method worked perfectly.

I have one more question. Can text be extracted out of any tag by just doing getText ?

Was it helpful?

Solution 2

i, j is title, link already. Why do you find them again?

for i, j in zip(titlesoup, linksoup):
    print(i.getText())
    print(j.getText())
    print("\n")

Beside that, pass features='xml' to BeautifulSoup if you parse xml file.

soup = BeautifulSoup(webpage, features='xml')

OTHER TIPS

I expect the problem is in

def getlink(a):
    ...
    a=bsoup.find('a').getText()
    ....

Remember find matches tag names, there is no link tag but an a tag. BeautifulSoup will return None from find if there is no matching tag, thus the NoneType error. Check the docs for details.

Edit:

If you really are looking for the text 'link' you can use bsoup.find(text=re.compile('link'))

b=bsoup.find('title') returns None

try checking your input

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top