Replace with this:
retval += new String(data, 0, x);
If you read less than 1024
and 1024
you read previously, you get the x
+ (1024-x
) data being left over from previous loop
Question
i am trying to write a java program to backup a HTTP Directory on a remote server. The remote server is across several VPNs/Firewalls/whatever, so the connection is not always the best.
So i start by downloading the root directory listing and go through the entries recursively. It is a simple single-threaded program.
So my problem is, that sometimes the HTML i get is corrupted. Mainly it has multiple Null-Bytes over the whole document, which i can remove with a replaceAll. But the other thing is, that it seems to have some text chunks two (or more?) times, so instead of "This is a text, please read me." i get something like "This is a teis is a xt, please read me.". If you cut out the duplicate "is is a ", it would be just fine. There are usually multiple of these duplicate texts over the whole document.
When i browse the directory with a browser (namely Firefox) i have no problems, everything seems fine. Just my downloader keeps getting corrupt data.
So here is my code snippet, which gets the HTML listing data:
InputStream is = con.getInputStream();
if ("gzip".equals(con.getContentEncoding())) {
is = new GZIPInputStream(is);
}
int x = 0;
byte[] data = new byte[1024];
while ((x = is.read(data, 0, 1024)) >= 0) {
if (x > 0) {
retval += new String(data);
}
}
Any ideas, what i am doing wrong?
Greetings!
Solution
Replace with this:
retval += new String(data, 0, x);
If you read less than 1024
and 1024
you read previously, you get the x
+ (1024-x
) data being left over from previous loop