Question

I am working on a small program to find text in a text file but I am getting a different result depending how I run my program.

When running my program from Netbeans I get 866 matches.

When running my program by double clicking on the .jar file in the DIST folder, I get 1209 matches (The correct number)

It seems that when I'm running the program from Netbeans, it doesn't get to the end of the text file. Is that to be expected ?

Text File in question

Here is my code for reading the file:

@FXML
public void loadFile(){
    //Loading file
    try{
        linelist.clear();
        aclist.clear();
        reader = new Scanner(new File(filepathinput));
        while(reader.hasNext()){
            linelist.add(reader.nextLine());
        }
        for(int i = 0; i < linelist.size()-1; i++){
            if(linelist.get(i).startsWith("AC#")){
                aclist.add(linelist.get(i));
            }
        }
    }
    catch(java.io.FileNotFoundException e){
        System.out.println(e);
    }
    finally{
        String accountString = String.valueOf(aclist.size());
        account.setText(accountString);
        reader.close();
    }
}
Was it helpful?

Solution

The problem is an incompatibility between the java app's (i.e. JVM) default file encoding and the input file's encoding.

The file's encoding is "ANSI" which commonly maps to Windows-1252 encoding (or its variants) on Windows machines.

When running the app from the command prompt, the JVM (so the Scanner implicitly) will take the system default file encoding which is Windows-1252. Reading the same encoded file with this setup will not cause the problem.

However, Netbeans by default sets the project encoding to utf-8, therefore when running the app from Netbeans its file encoding is utf-8. Reading the file with this encoding resulting to confusion of the scanner. The character "ï" (0xEF) of the text "Caraïbes" is the cause of the problem. Since it is one of characters of BOM ( = 0xEF 0xBB 0xBF) sequence, it is somehow messing up the scanner.

As a solution,
either specify the encoding type of the scanner explicitly

reader = new Scanner(file, "windows-1252");

or convert the input file encoding to utf-8 using notepad or better notepad++, and set encoding type to utf-8 without using system default.

reader = new Scanner(file, "utf-8");

However, when the different OSes are considered, working with utf-8 at all places will the preferred way dealing with multi-platform environments. Hence the 2nd way is to go.

OTHER TIPS

It can also depend on the filepathinput input. When jar and netbeans both might be referring to two different files. Possibly with same name in different location. Can you give more information on the filepathinput variable value?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top