How can I increase performance of sparql query while using inferencing?

https://stackoverflow.com/questions/12820873

06-07-2021
|

Question

I want to increase performance of my sparql queries. I have to run all type of sparql query. I have total 17,500,000 triples in the graph and i have other graph containg only knowledge. this graph containing same as and subclassOf property. Total triples of this graph is around 50,000,000, I am using on the fly inferencing in the sparql query.

I am using virtuoso as a database. It has inferencing functionality.

When I run query with inferencing, it is taking 80 secs for simple query. and without using inferencing it is taking 10 secs.

Sparql Query:

 DEFINE input:inference 'myrule' 
 select DISTINCT  ?uri1  ?uri2  
 from <GRAPH_NAME>  
 where {?uri1   rdf:type ezdi:Aspirin. 
 ?patient ezdi:is_treated_with ?uri1. 
 ?patient rdf:type ezdi:Patient. 
 ?uri2 rdf:type ezdi:Hypertension .
 ?patient ezdi:is_suffering_with ?uri2. 
 ?patient rdf:type ezdi:Patient  } ORDER BY ?patient

I have done all the indexing providing by the virtuoso. System has 32 GB RAM. And I have done NumberOfBuffer setting virtuoso.ini file.

I dont know what is the issue with inferencing. But I have to use Inferencing in the sparql Query.

If u know something then plz share ur idea.

Thank You

Solution

An ontology of 5M triples is quite large, though strictly speaking, that's not problematic. Performance with regards to reasoning is far more closely tied to the expressivity of your ontology than it's size. You could create an ontology with several order of magnitude fewer triples that would be harder to reasoning with.

With that said, there's not much I can specifically suggest. Virtuoso specific tuning is best left to their developers, so you might get some traction on their mailing list.

It appears that you're using some custom inferencing "my_rule" -- though in the comments you also claim RDFS & sameAs. You probably need to figure out what reasoning you're actually using, what profile (RDFS or OWL2 QL, RL, EL, DL) that your ontology falls into, and learn a little bit about how reasoning actually works. Further, equality reasoning is difficult, which you claim to be using in addition to RDFS. It might be possible that Virtuoso can compute the equivalence relations eagerly which could reduce the overhead of the query, but again, that is something you should take up with them on their mailing list.

Reasoning is not easy by any means, and there's no silver bullet for magically making reasoning faster beyond using a simpler, ie less expressive, ontology or less data, or both.

Lastly, you might try other databases which are designed for reasoning, such as OWLIM or Stardog. Not all databases are created equal, and it's entirely possible you've encoded something in your TBox which Virtuoso might not handle well, but could be handled easily by another system.

OTHER TIPS

There are many factors which could lead to the performance issue you describe. The most common is to make an error in the NumberOfBuffers setting in the INI file -- which we cannot see, and so cannot diagnose, here.

Questions specifically regarding Virtuoso are generally best raised on the public OpenLink Discussion Forums, the Virtuso Users mailing list, or through a confidential Support Case. If you bring this there, we should be able to help you in more detail.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow