DataInputStream readLine() deprecated

Question 1

I'm not 100% sure, but I found an example of the method working incorrectly as compared to BufferedReader.readLine(). Here's the code:

import java.io.*;

public class HelloWorld {
  public static void main(String[] args) throws Exception {
    String s = "喜\n";
    InputStream in = new ByteArrayInputStream(s.getBytes());
    DataInputStream d = new DataInputStream(in);
    System.out.println(d.readLine()); // prints å

    in = new ByteArrayInputStream(s.getBytes());
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    System.out.println(br.readLine()); // prints 喜
  }
}

Basically, it doesn't appear that DataInputStream handles multi-byte characters at all since it basically does char next = (char)in.read(); for each character.

Essentially, I think that you need at least a very small buffer in order to properly read multi-byte characters. That said, you could probably build your custom method on top of InputStreamReader directly instead of BufferedReader, since that will properly handle multi-byte characters. Alternatively, if you know you're always going to be dealing with ascii then you're probably safe using the deprecated method.

EDIT: it's also worth noting that even DataInputStream buffers internally in order to properly handle \r\n line endings. In jdk7, at least, the handling for \r is:

          case '\r':
            int c2 = in.read();
            if ((c2 != '\n') && (c2 != -1)) {
                if (!(in instanceof PushbackInputStream)) {
                    in = new PushbackInputStream(in);
                }
                ((PushbackInputStream)in).unread(c2);
            }
            break loop;

Thus, if we encounter something like \ra, the a is unread back onto a pushback input stream, which maintains an internal buffer of unread bytes.

Question 2

What they mean when they say readLine() was deprecated because it doesn't properly convert characters is that it does not allow you to specify the character encoding, for instance UTF-8 vs. CP1252. This means that data written using one character encoding would most likely fail if read one a system that defaulted to a different character encoding.

So, do you need to worry about it? Sure. Methods are deprecated to provide a warning to developers that it can possibly go away in the future. That said, according to the JavaDoc, readLine() was deprecated in JDK 1.1, which was a LOOONG time ago.

As to your point of not wanting a BufferedReader because of buffering, I'd say don't use it. Use one of the other classes that extend Reader or, if you want to be that extreme, roll your own. There is nothing stopping you from creating your own class called DataInputReader, tacking on methods to read your primitives, and providing a proper readLine() implementation to suit your needs.

However, if you are reading binary encoded data, I would recommend NOT using a Reader at all, and sticking with a InputStream so you can read raw bytes and handle the conversions yourself. Readers were designed with the handling of character encoding in mind, and as such have a tendency to modify what you are reading under the premise that it is trying to convert binary data to character strings.