I am sending SPARQL queries as asynchronous requests to a SPARQL endpoint, currently DBpedia using the dotNetRDF library. While simpler queries usually work, more complex queries sometimes result in timeouts.
I am looking for a way to handle the timeouts by capturing some event when they occur.
I am sending my queries by using one of the asynchronous QueryWithResultSet
overloads of the SparqlRemoteEndpoint
class.
As described for SparqlResultsCallback
, the state
object will be replaced with an AsyncError
instance if the asynchronous request failed. This does indicate that there was a timeout, however it seems that it only does so 10 minutes after the request was sent. When my timeout is, for example, 30 seconds, I would like to know 30 seconds later whether the request was successful. (35 seconds are ok, too, but you get the idea.)
Here is a sample application that sends two requests, the first of which is very simple and likely to succeed within the timeout (here set to 120 seconds), while the second one is rather complex and may easily fail on DBpedia:
using System;
using System.Collections.Concurrent;
using VDS.RDF;
using VDS.RDF.Query;
public class TestTimeout
{
private static string FormatResults(SparqlResultSet results, object state)
{
var result = new System.Text.StringBuilder();
result.AppendLine(DateTime.Now.ToLongTimeString());
var asyncError = state as AsyncError;
if (asyncError != null) {
result.AppendLine(asyncError.State.ToString());
result.AppendLine(asyncError.Error.ToString());
} else {
result.AppendLine(state.ToString());
}
if (results == null) {
result.AppendLine("results == null");
} else {
result.AppendLine("results.Count == " + results.Count.ToString());
}
return result.ToString();
}
public static void Main(string[] args)
{
Console.WriteLine("Launched ...");
Console.WriteLine(DateTime.Now.ToLongTimeString());
var output = new BlockingCollection<string>();
var ep = new SparqlRemoteEndpoint(new Uri("http://dbpedia.org/sparql"));
ep.Timeout = 120;
Console.WriteLine("Server == " + ep.Uri.AbsoluteUri);
Console.WriteLine("HTTP Method == " + ep.HttpMode);
Console.WriteLine("Timeout == " + ep.Timeout.ToString());
string query = "SELECT DISTINCT ?a\n"
+ "WHERE {\n"
+ " ?a <http://www.w3.org/2000/01/rdf-schema#label> ?b.\n"
+ "}\n"
+ "LIMIT 10\n";
ep.QueryWithResultSet(query,
(results, state) => {
output.Add(FormatResults(results, state));
},
"Query 1");
query = "SELECT DISTINCT ?v5 ?v8\n"
+ "WHERE {\n"
+ " {\n"
+ " SELECT DISTINCT ?v5\n"
+ " WHERE {\n"
+ " ?v6 ?v5 ?v7.\n"
+ " FILTER(regex(str(?v5), \"[/#]c[^/#]*$\", \"i\")).\n"
+ " }\n"
+ " OFFSET 0\n"
+ " LIMIT 20\n"
+ " }.\n"
+ " OPTIONAL {\n"
+ " ?v5 <http://www.w3.org/2000/01/rdf-schema#label> ?v8.\n"
+ " FILTER(lang(?v8) = \"en\").\n"
+ " }.\n"
+ "}\n"
+ "ORDER BY str(?v5)\n";
ep.QueryWithResultSet(query,
(results, state) => {
output.Add(FormatResults(results, state));
},
"Query 2");
Console.WriteLine("Queries sent.");
Console.WriteLine(DateTime.Now.ToLongTimeString());
Console.WriteLine();
string result = output.Take();
Console.WriteLine(result);
result = output.Take();
Console.WriteLine(result);
Console.ReadLine();
}
}
When I run this, I reproducibly get an output like the following:
13:13:23
Server == http://dbpedia.org/sparql
HTTP Method == GET
Timeout == 120
Queries sent.
13:13:25
13:13:25
Query 1
results.Count == 10
13:23:25
Query 2
VDS.RDF.Query.RdfQueryException: A HTTP error occurred while making an asynchron
ous query, see inner exception for details ---> System.Net.WebException: Der Rem
oteserver hat einen Fehler zurückgegeben: (504) Gatewaytimeout.
bei System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
bei VDS.RDF.Query.SparqlRemoteEndpoint.<>c__DisplayClass13.<QueryWithResultSe
t>b__11(IAsyncResult innerResult)
--- Ende der internen Ausnahmestapelüberwachung ---
results == null
Obviously, the exact times will be different, but the crucial point is that the error message based on the second query is received approximately 10 minutes after the request was sent, nowhere near the 2 minutes set for the timeout.
Am I using dotNetRDF incorrectly here, or is it intentional that I have to run an additional timer to measure the timeout myself and react on my own unless any response has been received meanwhile?