Question

I am developing a progam which reads a text file and creates a report. The content of the report is the following: the number of every string in file, its "status", and some symbols of every string beginning. It works well with file up to 100 Mb.

But when I run the program with input files which are bigger than 1,5Gb in size and contain more than 100000 lines, I get the following error:

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)

I increased VM arguments up to -Xms128m -Xmx1600m (in eclipse run configuration) but this did not help. Specialists from OTN forum advised me to read some books and improve my program's performance. Could anybody help me to improve it? Thank you.

code:

import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;

public class ReadFileToList {

public static void main(String[] args) throws FileNotFoundException
{


File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out); 
PrintStream ps = new PrintStream (fos);
System.setOut (ps);

// Create a file object
File file = new File("D:\\Docs\\test_in.txt");


FileReader fr = null;
LineNumberReader lnr = null; 


try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.

List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.


fr = new FileReader(file); 
lnr = new LineNumberReader(fr); 

for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " "); 

int stringlenght = line.length();
line = lnr.readLine(); 
int line_num = lnr.getLineNumber();

String status;

// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;


if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";



System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);


}
} catch (IOException e) {
e.printStackTrace();
}
}
}

Also specialists from OTN said that this programm opens the input and reading it twice. May be some mistakes in "for statement"? But I can't find it. Thank you.

Was it helpful?

Solution

You're declaring variables inside the loop and doing a lot of uneeded work, including reading the file twice - not good for peformance either. You can use the line number reader to get the line number and the text and reuse the line variable (declared outside the loop). Here's a shortened version that does what you need. You'll need to complete the validLength method to check all the values since I included only the first couple of tests.

import java.io.*;

public class TestFile {

//a method to determine if the length is valid implemented outside the method that does the reading
    private static String validLength(int length) {
        if (length == 12 || length == 13 || length == 2130) //you can finish it
            return "ok";
        return "fail";
    }

    public static void main(String[] args) {
        try {
            LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
            BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
            String line;
            int length;
            while (null != (line = lnr.readLine())) {
                length = line.length();
                line = line.substring(0,38);
                line = line.replace("\u0000", " ");
                line = line.replace("+", " ");
                out.write( lnr.getLineNumber() + length + validLength(length) + line);
                out.newLine();
            }
            out.close();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Call this as java TestFile D:\Docs\test_in.txt D:\Docs\test_in.txt or replace the args[0] and args[1] with the file names if you want to hard code them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top