Slow download with HttpURLConnection

https://stackoverflow.com/questions/23658953

22-07-2023
|

سؤال

I'm trying to make a method that download a webpage. First, i create a HttpURLConnection. Second, i call the connect() method. Third, i read the data through a BufferedReader.

The problem is that with some pages i get reasonable reading times, but with some pages it's Very slow (it can take about 10 minutes!). The slow pages are always the same, and they comes from the same website. Opening those pages with the browser takes just a few seconds instead of 10 minutes. Here is the code

static private String getWebPage(PageNode pagenode)
{
    String result;
    String inputLine;
    URI url;
    int cicliLettura=0;
    long startTime=0, endTime, openConnTime=0,connTime=0, readTime=0;
    try
    {
        if(Core.logGetWebPage())
            startTime=System.nanoTime();
        result="";
        url=pagenode.getUri();
        if(Core.logGetWebPage())
            openConnTime=System.nanoTime();
        HttpURLConnection yc = (HttpURLConnection) url.toURL().openConnection();
        if(url.toURL().getProtocol().equalsIgnoreCase("https"))
            yc=(HttpsURLConnection)yc;
        yc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;     rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)"); 
        yc.connect();
        if(Core.logGetWebPage())
            connTime=System.nanoTime();
        BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));

        while ((inputLine = in.readLine()) != null)
        {
            result=result+inputLine+"\n";
            cicliLettura++;
        }
        if(Core.logGetWebPage())
            readTime=System.nanoTime();
        in.close();
        yc.disconnect();
        if(Core.logGetWebPage())
        {
            endTime=System.nanoTime();
            System.out.println(/*result+*/"getWebPage eseguito in "+(endTime-startTime)/1000000+" ms. Size: "+result.length()+" Response Code="+yc.getResponseCode()+" Protocollo="+url.toURL().getProtocol()+" openConnTime: "+(openConnTime-startTime)/1000000+" connTime:"+(connTime-openConnTime)/1000000+" readTime:"+(readTime-connTime)/1000000+" cicliLettura="+cicliLettura);
        }
        return result;
    }catch(IOException e){
        System.out.println("Eccezione: "+e.toString());
        e.printStackTrace();  
        return null;
    }
}

Here you have two log samples One of the "normal" pages getWebPage executed Size: 48261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:569 cicliLettura=359

One of the "slow" pages http://ricette.giallozafferano.it/Pan-di-spagna-al-cacao.html/allcomments looks like this getWebPage executed Size: 1748261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:596834 cicliLettura=35685

المحلول

What you're likely seeing here is a result of the way you are collating result. Remember that Strings in Java are immutable - therefore when string concatenation occurs, a new String has to be instantiated, which can often involve copying all the data contained in that String. You have the following code executing for every line:

result=result+inputLine+"\n";

Under the covers, this line involves:

A new StringBuffer is created with the entire content of result so far
inputLine is appended to the StringBuffer
The StringBuffer is converted to a String
A new StringBuffer is created for that String
A newline character is appended to that StringBuffer
The StringBuffer is converted to a String
That String is stored as result.

This operation will become more and more time-consuming as result gets bigger and bigger - and your results appear to show (albeit from a sample of 2!) that the results increase drastically with page size.

Instead, use StringBuffer directly.

StringBuffer buffer = new StringBuffer();
while ((inputLine = in.readLine()) != null)
{
    buffer.append(inputLine).append('\n');
    cicliLettura++;
}
String result = buffer.toString();

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow