Question

I need to use the DataInputStream, since I need the deprecated readLine() functionality and I don't know the exact file format of the input file (i.e. what line ending is used), but also need to read binary encoded primitives.

This is similar to this question:

Is there a class that exposes an unbuffered readLine method in Java?

My suggestion is to use something like this

public class SaveDataInputStream extends DataInputStream {
  public SaveDataInputStream(InputStream in) {super(in);}
  public String readLineSave() throws IOException {
    // ???
  }
}

and to use the readLine() method content that can be found in the DataInputStream class (this is similar to the accepted answer in the referred question). However I don't fully understand why the method was deprecated and would prefer to know if it is relevant for my code.

The javadoc says: This method does not properly convert bytes to characters.

But what does that mean? Should I worry about that and what could happen in the worst case? Is it possible to write my own method that fixes the issue (efficiency is not really a concern)?

Hint: new BufferedReader(new InputStreamReader(..)); is not the correct answer...

Was it helpful?

Solution 2

I'm not 100% sure, but I found an example of the method working incorrectly as compared to BufferedReader.readLine(). Here's the code:

import java.io.*;

public class HelloWorld {
  public static void main(String[] args) throws Exception {
    String s = "喜\n";
    InputStream in = new ByteArrayInputStream(s.getBytes());
    DataInputStream d = new DataInputStream(in);
    System.out.println(d.readLine()); // prints å

    in = new ByteArrayInputStream(s.getBytes());
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    System.out.println(br.readLine()); // prints 喜
  }
}

Basically, it doesn't appear that DataInputStream handles multi-byte characters at all since it basically does char next = (char)in.read(); for each character.

Essentially, I think that you need at least a very small buffer in order to properly read multi-byte characters. That said, you could probably build your custom method on top of InputStreamReader directly instead of BufferedReader, since that will properly handle multi-byte characters. Alternatively, if you know you're always going to be dealing with ascii then you're probably safe using the deprecated method.

EDIT: it's also worth noting that even DataInputStream buffers internally in order to properly handle \r\n line endings. In jdk7, at least, the handling for \r is:

          case '\r':
            int c2 = in.read();
            if ((c2 != '\n') && (c2 != -1)) {
                if (!(in instanceof PushbackInputStream)) {
                    in = new PushbackInputStream(in);
                }
                ((PushbackInputStream)in).unread(c2);
            }
            break loop;

Thus, if we encounter something like \ra, the a is unread back onto a pushback input stream, which maintains an internal buffer of unread bytes.

OTHER TIPS

What they mean when they say readLine() was deprecated because it doesn't properly convert characters is that it does not allow you to specify the character encoding, for instance UTF-8 vs. CP1252. This means that data written using one character encoding would most likely fail if read one a system that defaulted to a different character encoding.

So, do you need to worry about it? Sure. Methods are deprecated to provide a warning to developers that it can possibly go away in the future. That said, according to the JavaDoc, readLine() was deprecated in JDK 1.1, which was a LOOONG time ago.

As to your point of not wanting a BufferedReader because of buffering, I'd say don't use it. Use one of the other classes that extend Reader or, if you want to be that extreme, roll your own. There is nothing stopping you from creating your own class called DataInputReader, tacking on methods to read your primitives, and providing a proper readLine() implementation to suit your needs.

However, if you are reading binary encoded data, I would recommend NOT using a Reader at all, and sticking with a InputStream so you can read raw bytes and handle the conversions yourself. Readers were designed with the handling of character encoding in mind, and as such have a tendency to modify what you are reading under the premise that it is trying to convert binary data to character strings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top