Question

I'm developing a web application using Jena and Tomcat and I find that quite frequently, usually after an exception is thrown, something happens to the TDB and certain calls to retrieve data from the model cause this execption:

org.openjena.atlas.lib.InternalErrorException: Invalid id node for subject (null node): ([00000000000010D2], [000000000000003D], [0000000000000072])
    at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:130)
    at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:116)
    at com.hp.hpl.jena.tdb.lib.TupleLib.access$000(TupleLib.java:45)
    at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:77)
    at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:73)
    at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:80)
    at com.hp.hpl.jena.util.iterator.Map1Iterator.next(Map1Iterator.java:47)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:80)
    at com.hp.hpl.jena.rdf.model.impl.StmtIteratorImpl.next(StmtIteratorImpl.java:45)
    at com.hp.hpl.jena.rdf.model.impl.StmtIteratorImpl.nextStatement(StmtIteratorImpl.java:55)
    at com.example.myApp (myApp.java:123)

Why does this keep happening - I am calling model.close() at the end of every doPost/doGet method. Most of the time it seems to work, it's only when something goes wrong and the server crashes or throws certain exceptions during development that I run into this problem.

Having got into this problem, is there any way to recover or is the only way to keep regular backups of the triple store in a file and then re-read them in again?

Thanks in advance.

Was it helpful?

Solution

A common cause of this problem is concurrent updates. TDB access should be locked from other writers and from readers during a write. You can do this either by using transactions, or by using application-level locking to enforce a MRSW policy yourself. You mention that problems occur in doPost/doGet methods: most web servers handle incoming requests via thread pool, so you may have concurrency in accessing the store through that route even if you're not using threads yourself.

Also be sure to track the latest Jena versions. There are occasional bug fixes that address index corruption issues, though as far as I know there are no reported problems outstanding at the moment. You can always check the Jena JIRA to see if there are any relevant open issues, or, if you have reproducible test case, please open a new ticket.

In general, index corruption should not happen. However, it's probably best to build-in backup and data recovery into your architecture, just as you would for any other data source. There's no way I know of to uncorrupt an index.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top