The problem is an incompatibility between the java app's (i.e. JVM) default file encoding and the input file's encoding.
The file's encoding is "ANSI" which commonly maps to Windows-1252 encoding (or its variants) on Windows machines.
When running the app from the command prompt, the JVM (so the Scanner implicitly) will take the system default file encoding which is Windows-1252. Reading the same encoded file with this setup will not cause the problem.
However, Netbeans by default sets the project encoding to utf-8, therefore when running the app from Netbeans its file encoding is utf-8. Reading the file with this encoding resulting to confusion of the scanner. The character "ï" (0xEF
) of the text "Caraïbes" is the cause of the problem. Since it is one of characters of BOM ( = 0xEF 0xBB 0xBF
) sequence, it is somehow messing up the scanner.
As a solution,
either specify the encoding type of the scanner explicitly
reader = new Scanner(file, "windows-1252");
or convert the input file encoding to utf-8 using notepad or better notepad++, and set encoding type to utf-8 without using system default.
reader = new Scanner(file, "utf-8");
However, when the different OSes are considered, working with utf-8 at all places will the preferred way dealing with multi-platform environments. Hence the 2nd way is to go.