Question

I am trying to read a text file which has around 3 lakh lines as of now.

How am I reading?

I am reading using the java.io.BufferedReader

Here is a small code snippet which represents my approach.

int lineNumber = 1;
BufferedReader br = null;
String currentLine = null;
br = new BufferedReader(new FileReader(f));//here f will be the file name to be read, I have passed
while ((cuurentLine = br.readLine()) != null) {
  //here I have written logic to do processing after reading 1000 lines
  //line number = 1001 start processing, similarly it reads next 1000 lines, each line is put in a List collection
  //after reaching 1001 line clearing list and continuing the loop
}

I have tried using NIO2 the following case

br = Files.newBufferedReader(Paths.get(inputFileName), StandardCharsets.UTF_16);

It resulted in the follwoing exception

exception :Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Unknown Source)
    at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
    at java.lang.AbstractStringBuilder.append(Unknown Source)
    at java.lang.StringBuffer.append(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    at TexttoExcelMerger.readFileLineByLine(TexttoExcelMerger.java:66)
    at TexttoExcelMerger.main(TexttoExcelMerger.java:255)

Firstly, Is my approach right?

Are there any efficient and fast approaches in NIO2, apache FileUtils or any other API for reading a file faster, which improves my file reading process faster. Can I read set of lines like first 1000 like
br.readFirst(1000);,
but without reading line by line or iterating as in my logic?

Was it helpful?

Solution

Any approach that reads entire files into memory is doomed to failure. Sooner or later the file will exceed available memory, and the program will cease to function and have to be redesigned completely. Not a good failure mode, as there is nothing the user can do in the interim. You are at that point mow. You were pretty ambitious to even try it with a file with hundreds of thousands of lines. Rethink and process a line at a time. Or use a database.

NB don't kid yourself. You're using java.io to read the file. The NIO2 component here is minimal. Not that you need it at all.

OTHER TIPS

Out of memory exception

You're running out of memory because you're trying to read too much of the file into memory. This could be happening in two ways I can think of.

You're doing it deliberately

If you're trying to save every line that you read in, you're going to run out or memory.

while ((curentLine = br.readLine()) != null) {
    stringBuilder.append(currentLine);
}

If you're just trying to save 1000 lines at a time, you might be able to just increase Java's heap size with -Xmx and be OK. It all depends on how much memory 1000 lines takes up.

You're doing it accidentally

If the file you're reading doesn't have any line breaks, then br.readLine() will attempt to read the whole thing, believing that it's one gigantic long line.

Reading without going line-by-line

If you imagine an arbitrary file of text, it's just a long string of characters. Some of these characters (EOL) have special meaning to humans and many programs, but they're still just characters. This means that you can't just say "give me the 10th line of text" without reading every character that comes before it (because you never know which character might be an EOL that you need to count).

You could use a fixed-length record format: you say that each line will be exactly $n$ characters lone (80, say). Now if you want to jump to the 10th line, you can jump to the 800th character. But if you're actually using UTF-16, then characters aren't a char and this doesn't really work.

That's OK, because you probably should be using a database at this point.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top