Pergunta

Hi I was making a crawler for a site. After about 3 hours of crawling, my app stopped on a WebException. below are my code in c#. client is predefined WebClient object that will be disposed every time gameDoc has already been processed. gameDoc is a HtmlDocument object (from HtmlAgilityPack)

while (retrygamedoc)
{
    try
    {
        gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
        retrygamedoc = false;
    }
    catch
    {
        client.Dispose();
        client = new WebClient();

        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

I tried to use code below (to keep the webclient fresh) from this answer

while (retrygamedoc)
{
    try
    {
        using (WebClient client2 = new WebClient())
        {
            gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
            retrygamedoc = false;
        }
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

but the result is still the same. Then I use StreamReader and the result stays the same! below are my code using StreamReader.

while (retrygamedoc)
{
    try
    {
        // using native to check the result
        HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
        string responsestring = string.Empty;
        HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            responsestring = reader.ReadToEnd();
        }
        gameDoc.LoadHtml(client.DownloadString(url));

        retrygamedoc = false;
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

What should I do and check? I am so confused because I got am able to crawl on some pages, on the same site, then in about 1000 reasults, it cause the exception. the message from exception is only The request was aborted: The connection was closed unexpectedly. and the status is ConnectionClosed

PS. the app is a desktop form app.

update :

Now I am skipping the values and turned them to null so that the crawling can goes on. But if the data is really needed, I still have to update the crawling result manually, which is tiring because the result contains thousands of record. Please help me.

example :

it was like you have downloaded like about 1300 data from the website, then the application stopped saying The request was aborted: The connection was closed unexpectedly. while all your internet connection still on and on a good speed.

Foi útil?

Solução

ConnectionClosed may indicate (and probably does) that the server you're downloading from is closing the connection. Perhaps it is noticing a large amount of requests from your client and is denying you additional service.

Since you can't control server-side shenanigans, I'd recommend you have some sort of logic to retry the download a bit later.

Outras dicas

Got this error because it was returned as 404 from the server.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top