Question

I use Jena and TDB to store RDF,and I want to do some inference on it.But the RDF data is big,and Jena's owl reasoner has to load all the data into memory . So I want to find one reasoner that can reason without load all data into memory,is there any one?

Was it helpful?

Solution

If you are prepared to take a subset of OWL, there are things you can do in a stream processing fashion without loading all your RDF data in memory and which will materialise all the inferred triples.

As an example, have a look at RIOT's infer command:

Source code here:

It is trivial to take RIOT's infer and run it in parallel with something like MapReduce, example is here:

Another different approach which uses MapReduce to apply the RDFS and OWL ter Horst rules and materialize all the derived statements is here:

Perhaps, you can look at the parts of OWL you are interested in and check if you can do it in a streaming fashion. If so, you can take RIOT's infer and extend it, adding the parts of OWL you are interested in. That would be a nice contribution to Apache Jena (get back in touch on the jena-dev mailing list if you want to do that).

WebPIE is a clever and interesting project, but as you can see, a bit more complex and it's a research project (with all that this implies from a long-term support and maintenance point of view). However, if it is OWL ter Horst you want/need, WebPIE would do. You could even put the effort, fork WebPIE and contribute it to an open source project, if others are interested in using it.

You might be interested to look also at Ymris (but this is currently sleeping... zzzzz):

OTHER TIPS

Not really. DL reasoning is computationally difficult at even low scale. With lots of data, that's just not going to work with the existing approaches. Doing it over secondary storage is still an open research problem afaik.

However, the various profiles of OWL exist to address this issue. They all have different computational complexities, which are all 'easier' than DL making them much more amenable to reasoning at scale. In particular, QL is designed for query time reasoning which in my experience tends to have a very small memory footprint and RL can be implemented with a standard rule reasoner.

So if you don't need to use DL, then I'd go with a tool that supports one of the profiles and you should get pretty good mileage out of that.

For reference, you might find this document about the compuational complexities of the various OWL dialects interesting.

You may want to try GRAKN.AI, they perform reasoning in real time on persisted data in distributed systems.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top