How to parse an XML with colons in some tags?

https://stackoverflow.com/questions/21785054

11-10-2022
|

题

I've been reading some tutorials on XmlPullParser in Android on how to parse XML data. To be more specific, I'm using the XML from https://gdata.youtube.com/feeds/api/standardfeeds/top_rated

Here I simplify part on an entry from this feed (I hope without altering the structure) in:

<entry>
<id>http://gdata.youtube.com/feeds/api/videos/abc45678qwe</id>
[...]
<title type='text'>THE TITLE</title>
[...]
<link rel='alternate' type='text/html' href='https://www.youtube.com/watch?v=abc45678qwe&amp;feature=youtube_gdata'/>
[...]
<media:group>
[...]
<media:title type='plain'>THE TITLE</media:title>
<yt:duration seconds='300'/>
[...]
<yt:videoid>abc45678qwe</yt:videoid>
</media:group>
<gd:rating average='1' max='5' min='1' numRaters='1' rel='http://schemas.google.com/g/2005#overall'/>
<yt:statistics favoriteCount='0' viewCount='11111111'/>
<yt:rating numDislikes='111' numLikes='111'/>
</entry>

I successfully get the title and the link with:

private String[] readEntry(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, null, "entry");
    String title = null;
    String link = null;

    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }

        String name = parser.getName();
        String rel = parser.getAttributeValue(null, "rel");

        if (name.equalsIgnoreCase("title")) {
            title = readTitle(parser);
        } else if (name.equalsIgnoreCase("link")
                && rel.equals("alternate")) {
            link = readLink(parser);
        } else {
            skip(parser);
        }
    }
    return new String[] { title, link };
}

private String readLink(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    String link = "";
    parser.require(XmlPullParser.START_TAG, null, "link");

    link = parser.getAttributeValue(null, "href");
    parser.nextTag();

    parser.require(XmlPullParser.END_TAG, null, "link");

    return link;
}

private String readTitle(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, null, "title");
    String title = readText(parser);
    parser.require(XmlPullParser.END_TAG, null, "title");
    return title;
}

But no matter what I try, I'm not able to get the duration in seconds from <yt:duration seconds='300'/>.

Clearly it can't be accessed with something similar to the above methods, as handling namespaces should be required, but I'm not sure. Since I'm kinda lost on this, any suggestion is much appreciated. Thanks.

====

edit: I'm adding what I tried to enter the tag yt:duration.

I added other checks before skip(parser);. I.e.:

} else if (name.equalsIgnoreCase("yt:")) {
    Utils.logger("i", "entering yt:", TAG);
    readDuration(parser)
}

and I changed "yt:" with "yt", or "yt:duration with no result.
Also with

String namespace = parser.getNamespace();

and changing name.equalsIgnoreCase... with namespace.equalsIgnoreCase... I don't get the log entry, so I don't even had a way to try this:

private String readDuration(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, "yt", "duration");

    String seconds = parser.getAttributeValue(null, "seconds");
    parser.nextTag();

    parser.require(XmlPullParser.END_TAG, "yt", "duration");

    Utils.logger("i", "duration: " + seconds + " seconds", TAG);
    return seconds;
}

Addition made "on request". I'm not sure it's useful enough.

解决方案

XmlPullParser seems to have the ability to be namespace aware, the difference is it has to be explicitly set. Per the documentation of XmlPullParseFactory#setNamespaceAware:

Specifies that the parser produced by this factory will provide support for XML namespaces. By default the value of this is set to false.

You might want to try that option.

Also, as mentioned in the comments I have tried to traverse through your xml with DOM with zero issues, below is the source code of printing all the duration values (just to let you know, this is to be run as a Java program and not within the ADT):

public static void main(String[] args) throws ParserConfigurationException,
            SAXException, IOException {
        InputStream path = new URL(
                "https://gdata.youtube.com/feeds/api/standardfeeds/top_rated")
                .openStream();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(path);
        traverse(document.getDocumentElement());

    }

    public static void traverse(Node node) {
        NodeList list = node.getChildNodes();
        for (int i = 0; i < list.getLength(); i++) {
            Node currentNode = list.item(i);
            traverse(currentNode);

        }

        if (node.getNodeName().equals("yt:duration")) {
            Element durationElement = (Element) node;
            System.out.println(durationElement.getAttribute("seconds"));
        }

    }

Output I get:

I always prefer recursion (as above) with DOM as it simplifies the full traversal thereby providing the flexibility too.

If you want to know more about grouping these elements together, you can refer to my post here as well.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow