Pregunta

I am wondering that when I open a file in notepad. I see a continuous line without any carriage return/line feed.

I made a java program to read the file. When I split the data from file by using \n or System.getProperty("line.separator");. I see lots of lines.

I found in hex editor that file has '0A' for new line ( used in UNIX ) and it appears as a rectangle in Notepad.

Well, my question is that if it doesn't have '0D' and 'OA' ( used in Windows for carriage return and line feed ). How my java program is splitting the data into lines? It should not split it.

Anyone have any idea?

¿Fue útil?

Solución

Java internally works with Unicode.

The Unicode standard defines a large number of characters that conforming applications should recognize as line terminators:[3]
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

(http://en.wikipedia.org/wiki/Newline) That's why it interprets \n as newline.

Otros consejos

The character \n is 0a (carriage return). If you split Windows line separators by \n only you'll split on the 0a, leaving the 0d characters behind.

Notepad shows 0a as a square, but it will render 0d0a as a newline.

Here's an example using Scala (it's Java under the covers) on Windows:

scala> "123\n456".split(System.getProperty("line.separator")).length
res1: Int = 1

scala> "123\n456".split("\r\n").length  // same as the line above on Windows
res2: Int = 1

scala> "123\n456".split("\n").length
res3: Int = 2

Windows Notepad is something to be strongly avoided when dealing with any type of text file.
I suggest using Notepad++.

Not only he'll display your text nicely, but it also has a feature to encode the file in UTF-8 and without BOM :D

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top