Question

I am trying to work with the RDF dump of DBLP, available from DBLP in RDF. I attempted to use Jena's rdfcat to convert that file to into Turtle format:

rdfcat -x dblp-2006-02-06.rdf -out t > dblp.ttl

Unfortunately, this aborts with the following error message:

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
 Maybe a striping error.
        at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error
(ErrorHandlerFactory.java:128)
        at org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDF
XML.java:246)
…

As far as I could learn from another question, What is a striping error?, a striping error occurs in RDF/XML parsing when the hierarchical XML structure does not conform to RDF/XML's even/odd rule. Now, looking into that file, the respective part of the file looks like this:

<rdf:Description rdf:about="http://www.informatik.uni-trier.de/~ley/db/journals/ac/ac40.html#YousifTD95"><dc:identifier>journals/ac/YousifTD95</dc:identifier><dc:date>2002-01-03</dc:date><rdf:type rdf:resource="http://sw.deri.org/~aharth/2004/07/dblp/dblp.owl#Article"/>
<dc:creator><foaf:Person rdf:nodeID="MazinSYousif"><foaf:name>Mazin S. Yousif</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="MatthewThazhuthaveetil"><foaf:name>Matthew Thazhuthaveetil</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="ChitaRDas"><foaf:name>Chita R. Das</foaf:name></foaf:Person></dc:creator>
<dc:title rdf:parseType="Literal">Cache Coherence in Multiprocessors: A Survey.</dc:title>
<pages>127-179</pages>
<year>1995</year>
<volume>40</volume>
<journal>Advances in Computers</journal>
</rdf:Description>

Line 378 seems to be the one with Matthew Thazhuthaveetil, according to Nano. However, somehow, I fail to see where that line could be structurally problematic (in particular when comparing that line to other lines around). Is there really a structural problem there (and if so, what is it), or is the error message misleading?

Was it helpful?

Solution

Just tried this myself with apache jena 2.11.1 and it was fine. Have you tried `riot --validate'?

The error is curious:

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
Maybe a striping error.

It's not showing printable characters, which is mysterious.

The error simply means that the rdf contains non-whitespace characters outside a property tag. That suggests it might have invisible junk, perhaps trailing after the </dc:creator>?

I don't see anything like that, so it really feels like an IO error somewhere.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top