When you do a search for, e.g., "History of Berlin", you're requesting a URL like
and you're getting back an XML result like this:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfResult
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://lookup.dbpedia.org/">
<Result>
<Label>Museum für Naturkunde</Label>
<URI>http://dbpedia.org/resource/Museum_für_Naturkunde</URI>
<Description></Description>
<Classes></Classes>
<Categories></Categories>
<Templates></Templates>
<Redirects></Redirects>
<Refcount>155</Refcount>
</Result>
<Result>
<Label>History of Berlin</Label>
<URI>http://dbpedia.org/resource/History_of_Berlin</URI>
<Description>
Berlin is the capital city of Germany. Berlin is a young city by European standards, founded in the 12th century.
</Description>
<Classes></Classes>
<Categories>
<Category>
<Label>History of Berlin</Label>
<URI>http://dbpedia.org/resource/Category:History_of_Berlin</URI>
</Category>
<Category>
<Label>History of Germany by location</Label>
<URI>http://dbpedia.org/resource/Category:History_of_Germany_by_location</URI>
</Category>
</Categories>
<Templates></Templates>
<Redirects></Redirects>
<Refcount>14</Refcount>
</Result>
</ArrayOfResult>
You're right that there are URI
elements with category URIs, e.g.,
<URI>http://dbpedia.org/resource/Category:History_of_Berlin</URI>
but what you should note is that from the root of the document, there are
ArrayOfResult/Result/Categories/Category/URI
elements, whereas the elements that you want are
ArrayOfResult/Result/URI
elements. You just need to process your XML a bit differently; don't get all the content from all URI
elements, but just from the URI
elements that are children of Result
elements. I'm not all that familiar with SAX parsing, but I think the important point is that once you've entered a Result
, you should only grab the URI
if you haven't entered another child element of Result
.