Question

I am trying to read files from a public anonymous ftp and I am running in to a problem. I can read the plain text files just fine, but when I try to read in gzip files, I get this exception:

Exception in thread "main" java.util.zip.ZipException: invalid distance too far back

at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at java_io_FilterInputStream$read.call(Unknown Source)
at GenBankFilePoc.main(GenBankFilePoc.groovy:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

I have tried downloading the file and using a FileInputStream wrapped in a GZIPInputStream and got the exact same problem, so I don't think it is a problem with the FTP client (which is apache).

Here is some test code that reproduces the problem. It is just trying to print to stdout:

    FTPClient ftp = new FTPClient();
    ftp.connect("ftp.ncbi.nih.gov");
    ftp.login("anonymous", "");
    InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));

    try {
        byte[] buffer = new byte[65536];
        int noRead;

        while ((noRead = is.read(buffer)) != 1) {
            System.out.write(buffer, 0, noRead);
        }
    } finally {
        is.close();
        ftp.disconnect();
    }

I cannot find any documentation on why this would be happening, and following it through the code in a debugger is not getting me anywhere. I feel like I am missing something obvious.

EDIT: I manually downloaded the file and read it in with a GZIPInputStream and was able to print it out just fine. I have tried this with 2 different Java FTP Clients

Was it helpful?

Solution

Ah, I found out what was wrong. You have to set the file type to FTP.BINARY_FILE_TYPE so that the SocketInputStream returned from retrieveFileStream is not buffered.

The following code works:

    FTPClient ftp = new FTPClient();
    ftp.connect("ftp.ncbi.nih.gov");
    ftp.login("anonymous", "");
    ftp.setFileType(FTP.BINARY_FILE_TYPE);
    InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));

    try {
        byte[] buffer = new byte[65536];
        int noRead;

        while ((noRead = is.read(buffer)) != 1) {
            System.out.write(buffer, 0, noRead);
        }
    } finally {
        is.close();
        ftp.disconnect();
    }
}

OTHER TIPS

You need to first download the file completely before, since ftp.retrieveFileStream() doesn't support file seeking.

Your code should be:

FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
File downloaded = new File("");
FileOutputStream fos = new FileOutputStream(downloaded);
ftp.retrieveFile("/genbank/gbbct1.seq.gz", fos);
InputStream is = new GZIPInputStream(new FileInputStream(downloaded));

try {
    byte[] buffer = new byte[65536];
    int noRead;

    while ((noRead = is.read(buffer)) != 1) {
        System.out.write(buffer, 0, noRead);
    }
} finally {
    is.close();
    ftp.disconnect();
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top