Question

I try to read a RSS-feed using feed parser.

import feedparser
url = 'http://example.com/news.xml'
d=feedparser.parse(url)
f = open('rss.dat','w')
for e in d.entries:
   title = e.title
   print >>f, address
f.close()

It works fine with English RSS-feeds but I get a UnicodeEncodeError if I try to display a title written in Cyrillic letters. It happens when I:

  1. Try to write a title into a file.
  2. Try to display a title into the screen.
  3. Try to use it in URL to access a web page.

My question is how to solve this problem easily. I would love to have a solution as simple as this:

new_title = some_function(title)

May be there is a way to replace every Cyrillic symbol by its HTML code?

Was it helpful?

Solution

FeedParser itself works fine with encodings, except in the case when it is wrongly declared. Refer to http://code.google.com/p/feedparser/issues/detail?id=114 for a possible explanation. It seems Python 2.5 uses ascii as default encoding, and causes problems. Can you paste the actual feed URL, to see how the encoding is declared there. If it appear that the declare encoding is wrong - you'll have to find a way to instruct FeedParser to override the default value.

EDIT: Okay, it seems the error is in the print statement. Use

f.write(title.encode('utf-8'))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top