Pregunta

I have a program that load data into a DB thru namedpipes, very cool. This program was running for about 2 years and accept text files or gzip.

But now appeared some zip to load and I want to improve it. But I can't put this to work, I'm getting an OutOfMemoryError.

(Of course, I'm calling this using -Xms512M -Xmx2048M)

Below is how I get the InputStream:

PipeLoader.java

protected BufferedReader getBufferedReader(File file, String compression) throws Exception {
    BufferedReader bufferedReader = null;

    if(compression.isEmpty())   {
        bufferedReader = new BufferedReader(new FileReader(file), BUFFER);
    } else if(compression.equalsIgnoreCase("gzip")) {
        InputStream fileStream = new FileInputStream(file);
        InputStream gzipStream = new GZIPInputStream(fileStream);

        // Works fine
        Reader reader = new InputStreamReader(gzipStream);
        bufferedReader = new BufferedReader(reader, BUFFER);
    } else if(compression.equalsIgnoreCase("zip")){
        InputStream fileStream = new FileInputStream(file);
        ZipInputStream zipStream = new ZipInputStream(fileStream);
        zipStream.getNextEntry(); // For testing purposes I'm getting only the first entry

        Reader reader = new InputStreamReader(zipStream); // Works only with small zips
        bufferedReader = new BufferedReader(reader, BUFFER);
    }

    return bufferedReader;
}

I'm also tried with TrueVFS library:

// The same: works with small zip files, OutOfMemoryError with big zip files
TFile tFile = new TFile(file);
TFileInputStream tfis = new TFileInputStream(new TFile(tFile.getAbsolutePath(), tFile.list()[0]));

Reader reader = new InputStreamReader(tfis);
bufferedReader = new BufferedReader(reader, BUFFER);

And yes, I'm closing everything properly (remember, works with gz!).

In this case I need to load some zip file with only 1 plain textfile inside (~4GB zipped, ~35GB unzipped)

I got an OutOfMemoryError in the first file, in less than 1min from the start.

PS.: This is not a duplicate from Reading a huge Zip file in java - Out of Memory Error, he had the option to read each one of the small files from inside the zip, but I have only 1 big file.

I ran with -XX:+HeapDumpOnOutOfMemoryError and readed the .hprof file with Memory Analyser, but it doesn't help me much =/:

MemoryAnalyser.png

Please, I need help.

¿Fue útil?

Solución

If you look at the stacktrace, you can see that BufferedReader.readLine() ultimately leads to the creation of a very large array, which is causing the OutOfMemoryError.

Since readLine() keeps reading the input until it reaches a line break, this indicates that there are no (or very few) line breaks in the zipped input file.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top