Question

I'm new to java ...in my current project I need to read and write a very huge text file (1 GB - 5 GB) ... first i used this classes : BufferedReader and BufferedWriter

public static String read(String dir) {
    BufferedReader br;
    String result = "", line;
    try {
        br = new BufferedReader(new InputStreamReader(new FileInputStream(dir), "UTF-8"));
        while ((line = br.readLine()) != null) {
            result += line + "\n";
        }
    } catch (IOException ex) {
        //do something
    }
    return result;
}

public static void write(String dir, String text) {
    BufferedWriter bw;
    try {
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(dir), "UTF-8"));
        bw.write("");
        for (int i = 0; i < text.length(); i++) {
            if (text.charAt(i) != '\n') {
                bw.append(text.charAt(i));
            } else {
                bw.newLine();
            }
        }
        bw.flush();
    } catch (IOException ex) {
        //do something
    }
}

this classes works very good but not for Huge files...

then I used MappedByteBuffer for the read() method (I dont't know how to write a file using this class) :

public static String read(String dir) {
    FileChannel fc;
    String s = "";
    try {
        fc = new RandomAccessFile(dir, "r").getChannel();
        MappedByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
        buffer.load();
        buffer.force();
        for (int i = 0; i < buffer.limit(); i++) {
            s += (char) buffer.get();
        } //I know the problem is here
        buffer.clear();
        inChannel.close();
    } catch (IOException e) {
        //do something
    }
    return s;
}

But still can't read large files(over 30-40 MB), even the NotePad is faster than my app :))

and also another problem is I don't know how to change encoding in second way(for example "UTF-8", "ANSI",...)

so guys, please tell me which is the best way to read and write laaaarge files? any idea?

Was it helpful?

Solution 2

At the very least, I'd recommend changing

result += line + "\n";

to a StringBuilder.

resultBldr.append(line).append("\n");

This avoids creating a new string object--a bigger and bigger and bigger and bigger string object!--on each line.

Also, you should definitely write your output to the file line by line. Don't accumulate all that text and then output it.

In other words, in this situation, complete separation between your read and write functions is not recommended.

OTHER TIPS

result += line + "\n";

this line tries to keep the entire file contents in memory. Try to process each line as you read it like this

while ((line = br.readLine()) != null) {
            processLine( line ); // this may write it to another file.
        }

Think that every contatenation of strings creates a new string, so, if you read every character of a big file of 40 MB and concatenate you are creating in total like 40.000.000 string in read().

Try to use StringBuffer instead of String, that is recomendable for this situations.

Its always a bad idea to read large size files in the range of 1GB - 5GB in a single shot. There will be a huge performance over head and your app will slow down.

Its better to split this huge file into smaller chunks and read it chunk by chunk. i think if you start reading files in smaller chunks the code that you have written will work perfectly fine.

Have you heard about HDFS system, Solr indexing, apache hadoop frameworks which are specifically provided for manipulating huge data. you might want to have a look into it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top