문제

One of the lines in a java file I'm trying to understand is as below.

return new Scanner(file).useDelimiter("\\Z").next();

The file is expected to return upto "The end of the input but for the final terminator, if any" as per java.util.regex.Pattern documentation. But what happens is it returns only the first 1024 characters from the file. Is this a limitation imposed by the regex Pattern matcher? Can this be overcome? Currently I'm going ahead using a filereader. But I would like to know the reason for this behaviour.

도움이 되었습니까?

해결책

Try wrapping the file object in a FileInputStream

다른 팁

Myself, I couldn't reproduce this. But I think I can shed light as to what is going on.

Internally, the Scanner uses a character buffer of 1024 characters. The Scanner will read from your Readable 1024 characters by default, if possible, and then apply the pattern.

The problem is in your pattern...it will always match the end of the input, but that doesn't mean the end of your input stream/data. When Java applies your pattern to the buffered data, it tries to find the first occurrence of the end of input. Since 1024 characters are in the buffer, the matching engine calls position 1024 the first match of the delimiter and everything before it is returned as the first token.

I don't think the end-of-input anchor is valid for use in the Scanner for that reason. It could be reading from an infinite stream, after all.

Scanner is intended to read multiple primitives from a file. It really isn't intended to read an entire file.

If you don't want to include third party libraries, you're better off looping over a BufferedReader that wraps a FileReader/InputStreamReader for text, or looping over a FileInputStream for binary data.

If you're OK using a third-party library, Apache commons-io has a FileUtils class that contains the static methods readFileToString and readLines for text and readFileToByteArray for binary data..

You can use the Scanner class, just specify a char-set when opening the scanner, i.e.:

Scanner sc = new Scanner(file, "ISO-8859-1");

Java converts bytes read from the file into characters using the specified charset, which is the default one (from underlying OS) if nothing is given (source). It is still not clear to me why Scanner reads only 1024 bytes with the default one, whilst with another one it reaches the end of a file. Anyway, it works fine!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top