Loading verly large RDF file into openrdf Sesame ontology manager

https://stackoverflow.com/questions/3545975

30-09-2019
|

Question

I need to load very large ontology represented as N-triples file(1gb) to the openrdf Sesame application. I'm using the workbench interface to do that. I know that this file is too big to be loaded in one request. To get around that, I splitted my files in files of size 100mb. But I still get a error form the openrdf Sesame server :

HTTP ERROR 500

Problem accessing /openrdf-workbench/repositories/business/add. Reason:

    Unbuffered entity enclosing request can not be repeated.
Caused by:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.
 at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

Has anyone a good knowledge of openrdf Sesame or other ontology manager that I could use for my task ?

Thanks a lot for your input

Solution

The Sesame Workbench is really not the ideal tool for these kinds of tasks - although I would expect it to be able to cope with 100MB files. It might be that the Tomcat on which you run Sesame has a POST limit set? You could ask around on Sesame's mailinglist, there's quite few knowledgeable people there as well. But here are two possible ideas to get things done:

One way to handle this is to do your upload programmatically, using Sesame's Repository API. Have a look at the user documentation on the Sesame website for code examples.

Alternatively, if you are using a Sesame native store, you could do a 'dirty' workaround using Sesame's command line console: create a local native triple store and upload your data to that local store (this should be much quicker because no HTTP communication is necessary). Then, shut down your Sesame server, copy the datafiles of the local native store over the store data files in your server, and restart.

OTHER TIPS

I had the same problem. When i tried to upload "large" RDF (around 40MB) the upload process faild with error:

Unbuffered entity enclosing request can not be repeated.

I try other wersion of Tomcat and also sesame but without success. Then I try to use sesame console and local repository (not localhost on tomcat server - as Jeen say in another answer) it show me another error:

Malformed document: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK. [line 1, column 1]

So I think error about Entity Limit is covered somewhere in tomcat by error about Umbuffered entity.

Then I found this topic What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster and add this statement before tomcat starting:

export JAVA_OPTS="${JAVA_OPTS} -Djdk.xml.entityExpansionLimit=0"

This statement disable entity limit in XML parser (default is 64 000 as error message says). After this step im possible to load "large" RDF (tested on 40-800MB).

I don't know exactly what task you hope to achieve, but you may want to check out here for a list of scalable triple stores with informal (mainly self-claimed) scalability results. In this, Sesame only reports handling 70M statements (not so many... might be the cause of your troubles.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow