Question

I'm using this great snippet from How to download and save a file from Internet using Java? to download a file from an url :

URL website = new URL("http://www.website.com/information.asp");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("information.html");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);

But instead of Long.MAX_VALUE, I prefer limit the download to 2mb for security reasons, so I replaced it by

fos.getChannel().transferFrom(rbc, 0, 2097152);

But now, I'm wondering how can I handle the case where the file size is greater than 2mb?

What can I do to check if the file is corrupt or not?

Was it helpful?

Solution

Have you considered checking the Content-Length header as per the RFC? You could then check if this exceeds some acceptable value -- in your case 2MB -- and reject further processing. You could accomplish this with an initial HTTP HEAD request and then a GET if you're happy, or by reading the headers of just the GET response and proceeding with further streaming if acceptable.

Alternatively (but admittedly ugly), you could use a BufferedReader passing in a buffer of 2MB and comparing that with the headers.

As for corruption, you're better off using a checksum as stated in other comments. Of course, this requires you knowing the checksum for the resource up-front, and is not something you're likely to get from the HTTP response itself.

OTHER TIPS

There are actually two aspects to this Question:

  • how do you know if you've downloaded the entire file, and

  • how do you know if what you have downloaded is corrupt.

First thing to note is that if you "chop" the file transfer at 2Mb, then if the apparent transferred file size is 2Mb you can be pretty sure that it won't be complete. (By the looks of it, your current code will give you the bytes after any transfer encoding has been decoded ... which simplifies things.)

Next thing to note is that an HTTP response will often include a Content-length header that tells the client how many bytes of (transfer encoded) content to expect in the response body. However, that won't tell you if the bytes you actually received (after decoding) are actually correct. (And besides, this header is optional ... you can't rely on it being there.)

As @ato notes, you would be better off checking the Content-length in the GET (or a HEAD) response before you actually try to read the data.

However, the only sure-fire way to know if you've got a complete / non-corrupt file is to check it against a checksum or (ideally) a crypto-hash that you obtained separately from the transfer. There is no standard way of obtaining a checksum or hash using the HTTP protocol.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top