Domanda

This is a pretty trivial thing: Replace a line

  • possibly containing trailing blanks
  • ended by '\n', '\r', '\r\n' or nothing

by a line containing no trailing blanks and ended by '\n'.

I thought I could do it via a simple regex. Here, "\\s+$" doesn't work as the $ matches before the final \n. That's why there's \\z. At least I thought. But

"\n".replaceAll("\\s*\\z", "\n").length()

returns 2. Actually, $, \\z, and \\Z do exactly the same thing here. I'm confused...


The explanation by Alan Moore was helpful, but it was just now when it occurred to me that for replacing an arbitrary final blank garbage at EOF I can do

replaceFirst("\\s*\\z"", "\n");

instead of replaceAll. A simple solution doing all the things described above is

replaceAll("(?<!\\s)\\s*\\z|[ \t]*(\r?\n|\r)", "\n");

I'm afraid, it's not very fast, but it's acceptable.

È stato utile?

Soluzione

Actually, the \z is irrelevant. On the first match attempt, \s* consumes the linefeed (\n) and \z succeeds because it's now at the end of the string. So it replaces the linefeed with a linefeed, then it tries to match at the position after the linefeed, which is the end of the string. It matches again because \s* is allowed to match empty string, so it replaces the empty sting with another linefeed.

You might expect it to go on matching nothing and replacing it with infinite linefeeds, but that can't happen. Unless you reset it, the regex can't match twice at the same position. Or more accurately, starting at the same position. In this case, the first match started at position #0, and the second at position #1.

By the way, \s+$ should match the string "\n"; $ can match the very end of the string as well as before a line separator at the end of the string.

Update: In order to handle both cases: (1) getting rid of unwanted whitespace at the end of the line, and (2) adding a linefeed in cases where there's no unwanted whitespace, I thin your best bet is to use a lookbehind:

line = line.replaceAll("(?<!\\s)\\s*\\z", "\n");

This will still match every line, but it will only match once per line.

Altri suggerimenti

Could you just do something like the following?

 String result = myString.trim() + '\n';
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top