문제

I've got some Java code that runs quite the expected way, but it's taking some amount of time -some seconds- even if the job is just looping through an array.

The input file is a Fasta file as shown in the image below. The file I'm using is 2.9Mo, and there are some other Fasta file that can take up to 20Mo.

enter image description here

And in the code im trying to loop through it by bunches of threes, e.g: AGC TTT TCA ... etc The code has no functional sens for now but what I want is to append each Amino Acid to it's equivalent bunch of Bases. Example :

AGC - Ser / CUG Leu / ... etc

So what's wrong with the code ? and Is there any way to do it better ? Any optimization ? Looping through the whole String is taking some time, maybe just seconds, but need to find a better way to do it.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class fasta {
    public static void main(String[] args) throws IOException {

        File fastaFile;
        FileReader fastaReader;
        BufferedReader fastaBuffer = null;
        StringBuilder fastaString = new StringBuilder();

        try {
            fastaFile = new File("res/NC_017108.fna");
            fastaReader = new FileReader(fastaFile);
            fastaBuffer = new BufferedReader(fastaReader);
            String fastaDescription = fastaBuffer.readLine();
            String line = fastaBuffer.readLine();

            while (line != null) {
                fastaString.append(line);
                line = fastaBuffer.readLine();
            }

            System.out.println(fastaDescription);
            System.out.println();
            String currentFastaAcid;

            for (int i = 0; i < fastaString.length(); i+=3) {
                currentFastaAcid = fastaString.toString().substring(i, i + 3);
                System.out.println(currentFastaAcid);
            }

        } catch (NullPointerException e) {
            System.out.println(e.getMessage());
        } catch (FileNotFoundException e) {
            System.out.println(e.getMessage());
        } catch (IOException e) {
            System.out.println(e.getMessage());
        } finally {
            fastaBuffer.close();
        }

    }

}
도움이 되었습니까?

해결책 2

The big factor here is you are doing the call to substring over a new String each time.

Instead, use substring directly over the stringbuilder

for (int i = 0; i < fastaString.length(); i+=3){
    currentFastaAcid = fastaString.substring(i, i + 3);
    System.out.println(currentFastaAcid);
}

Also, instead of print the currentFastaAcid each time, save it into a list and print this list at the end

List<String> acids = new LinkedList<String>();

for (int i = 0; i < fastaString.length(); i+=3){
    currentFastaAcid = fastaString.substring(i, i + 3);
    acids.add(currentFastaAcid);
}

System.out.println(acids.toString());

다른 팁

currentFastaAcid = fastaString.toString().substring(i, i + 3);

Please replace with

currentFastaAcid = fastaString.substring(i, i + 3);

toString method of StringBuilder create new instance of String object every time you call it. It still contain a copy of all your large string. If you call substring directly from StringBuilder it will return a small copy of substring. Also remove System.out.println if you don't really need it.

Your main problem besides the debug output surely is, that you are creating a new String with your completely read data from the file in each iteration of your loop:

currentFastaAcid = fastaString.toString().substring(i, i + 3);

fastaString.toString() will give the same result in each iteration and therefore is redundant. Get it outside the loop and you will surely save some seconds runtime.

Apart from suggested optimization in the serial code, I will go for parallel processing to reduce time further. If you have really big file, you can divide the work of reading file and processing read-lines, in separate threads. That way, when one thread is busy reading nextline from large file, other thread can process read-lines and print them on console.

If you remove the

System.out.println(currentFastaAcid);

line in the for loop, you will gain quite decent time.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top