문제

I try to parse some atom feed. E.g.

>>> feedparser.parse("""
    <?xml version="1.0" encoding="utf-8"?>
    <feed xmlns:a="http://example.com">
      <entry>
        <a:name>123</a:name>
        <a:name xml:lang="es"></a:name>
      </entry>
    </feed>
""").entries[0]

{u'a_name': {'xml:lang': u'es'}}

I want instead to receive something like this:

{u'a_name': '123'}

or

{u'a_name': ['123', '']}

Curios thing, that if you change name to title - feedparser works fine.

But I need parse custom tags from other namespaces.

도움이 되었습니까?

해결책

From rfc4287:

o atom:entry elements MUST contain exactly one atom:title element.

There is no mention of a name element as a child element of entry.

Section 6.3 says

When unknown foreign markup is encountered as a child of atom:entry, atom:feed, or a Person construct, Atom Processors MAY bypass the markup and any textual content and MUST NOT change their behavior as a result of the markup's presence.

FeedParser is a generic parser, which works with many different types of feeds, as a consequence, various subtleties or more advanced usage may not be supported. In particular, it doesn't support this feature (A quick glance at the source seems to verify this).

In other words, you'll either need to modify FeedParser, find some other ATOM parser (I'm not aware of any), or write something yourself...

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top