Question

I have several dictionary files that i read in Java and while reading them line by line, i use this code:

       public static void main(String args[]) { 


    try {

        FileInputStream fstream1 = new FileInputStream("de-DE.dic");
                    DataInputStream in = new DataInputStream(fstream1);
                BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));

      String str;
      while ((str = br.readLine()) != null) {
         String str_uc=str.toUpperCase(Locale.GERMAN);
          if(hasApostrophe(str_uc)){
              allletters.add(str_uc);
          if(str.length()==3)
              threeletter.add(str_uc);
          else if(str.length()==4)
              fourletter.add(str_uc);
          else if(str.length()==5)
              fiveletter.add(str_uc);
          else if(str.length()==6)
              sixletter.add(str_uc);
          else if(str.length()==7)
              sevenletter.add(str_uc);
       }
      }
      in.close();

    }
    catch (Exception e) {
      System.err.println(e);
    }

However, it always add one empty char to the first line word and for example if three letter word is in first line,it is added to fourletter array. How can i prevent this happening? Thanks.

ADDITION:

Here is a few lines from the file:

Aachens
Aachen
Aal
Aale
Aalen
Aales
Aals
Aas
Aases
Aasgeier
Aasgeiern
Aasgeiers
Was it helpful?

Solution

EDIT: Its a note pad issue. Read this

Use

String str_uc=str.trim().toUpperCase(Locale.GERMAN);

trim() will remove whitespace characters at the start or end of the line.

OTHER TIPS

Probably the correct fix

Java has two sorts of streams:

Binary streams - these can read any sort of data, but in order to read the data you have to know what sort it is

Text streams - here it is agreed you are reading text

What you are doing is opening a FileInputStream which is a form of binary stream and then decorating it with a BufferedReader, which is first decorated by a dataInputStream.

The more correct approach would be to open a FileReader in the first place, and then decorate it (only if you need to, performance wise) with a BufferedReader.

See the JAVA API for FileReader

If you switch to FileReader your 'unexpected' problem will be resolved (this is because the way you read strings from a file when you think you're storing them in binary format, which means DataInput expects to read an 'indication' that these chars are a string before the actual string)

Quick Fix Also, if you don't want to switch, you can always just call the trim method of the String class.

Another quick fix

Use scanner instead of all the streams you're opening, scanner accepts a filename parameter and opens the file, you can use scanner's next() method and has very advanced parsing abilities, see Scanner

According to PC.'s answer, I may suggest you to convert your file encoding to following in Notepad++

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top