Question

For some odd reason when I try to get a webpage source using URLConnection I get a "null" in the output. Can anyone shed some light please?

My method:

public String getPageSource()
        throws IOException
{
    URL url = new URL( this.getUrl().contains( "http://" ) ? this.getUrl() : "http://" + this.getUrl() );
    URLConnection urlConnection = url.openConnection();

    BufferedReader br = new BufferedReader( new InputStreamReader( urlConnection.getInputStream(), "UTF-8" ) );

    String source = null;
    String line;

    while ( ( line = br.readLine() ) != null )
    {
        source += line;
    }

    return source;
}

How I call it:

public static void main( String[] args )
        throws IOException
{
    WebPageUtil wpu = new WebPageUtil( "www.something.com" );

    System.out.println( wpu.getPageSource();
}

WPU consturctor:

public WebPageUtil( String url )
{
    this.url = url;
}

The output is always something like:

null<html><head>... //and then the rest of the source code, which is scraped correctly

Nothing difficult, right? But where is that damn "null" coming from?!

Thanks for advice!

Was it helpful?

Solution

You're initializing the String source will a null value so it's value is translated to the literal "null" on the first String concatenation in the while loop.

Use an empty String instead

String source = "";

or better use a StringBuilder.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top