Question

I am trying to read a binary file in Java using the bufferedReader. I wrote that binary-file using "UTF-8" encoding. The code for writing into a binary file:

  byte[] inMsgBin=null;
  try {
      inMsgBin = String.valueOf(cypherText).getBytes("UTF-8");
      //System.out.println("CIPHER TEXT:FULL:BINARY WRITE: "+inMsgBin);
  } catch (UnsupportedEncodingException ex) {
      Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
  }
  try (FileOutputStream out = new FileOutputStream(fileName+ String.valueOf(new SimpleDateFormat("yyyyMMddhhmm").format(new Date()))+ ".encmsg")) {
      out.write(inMsgBin);
      out.close();
  } catch (IOException ex) {
      Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
  }       


System.out.println("cypherText charCount="+cypherText.length());

Here 'cypherText' is a String with some content. Total no of characters written in the file is given as 19. Also after writing, when I open the binary file in Notepad++, it shows some characters. Selecting all the content of the file counts to 19 characters in total.

Now when I read the same file using BufferedReader, using the following lines of code:

try
        {
        DecMessage obj2= new DecMessage();
          StringBuilder cipherMsg=new StringBuilder();

            try (BufferedReader in = new BufferedReader(new FileReader(filePath))) {
                String tempLine="";
                fileSelect=true;
                while ((tempLine=in.readLine()) != null) {                      
                    cipherMsg.append(tempLine);
                }
            }

System.out.println("FROM FILE: charCount= "+cipherMsg.length());

Here the total no of characters read (stored in 'charCount') is 17 instead of 19.

How can I read all the characters of the file correctly?

Was it helpful?

Solution

Specify the same charset while reading file.

   try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(),
                    StandardCharsets.UTF_8))

UPDATE

Now i got your problem. Thanks for the file.

Again : Your file still readable to any text reader like Notepad++ ( Since your characters includes extended and control characters you are seeing those non readable characters . but it is still in ASCII.)

Now back to your problem, You have two problem with your code.

  1. While reading file you should specify the Correct Charset. Readers are character readers - Bytes would be convert into characters while reading. If you specify the Charset it would use that else it would use the default system charset. So you should create BufferedReader as follows

    try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(), StandardCharsets.UTF_8))

  2. Second issue, you have characters which includes Control characters. while reading file line by line , by default bufferedReader uses System's default EOL characters and skip those characters. thats why you are getting 17 instead of 19 ( since you have 2 characters are CR). To avoid this issue you should read characters.

    int ch; while ((ch = br.read()) > -1) { buffer.append((char)ch); }

Overall the below method would return proper text.

static String readCyberText() {
        StringBuilder buffer = new StringBuilder();
        try (final BufferedReader br = Files.newBufferedReader(new File("C:\\projects\\test2201404221017.txt").toPath(),
                StandardCharsets.UTF_8)){
            int ch;
            while ((ch = br.read()) > -1) {
                buffer.append((char)ch);
            }
            return buffer.toString();
        }
        catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

And you can test by

String s = readCyberText();
    System.out.println(s.length());
    System.out.println(s);

and output as

19
ia@

m©Ù6ë<«9K()il

Note: the length of String is 19, however when it display it just displayed 17 characters. because the console considered as eof and displayed in different line. but the String contain all 19 characters properly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top