سؤال

I have this input String (containg tabs, spaces, linebreaks):


        That      is a test.              
    seems to work       pretty good? working.








    Another test  again.

[Edit]: I should have provided the String for better testing as stackoverflow removes all special characters (tabs, ...)

String testContent = "\n\t\n\t\t\t\n\t\t\tDas      ist ein Test.\t\t\t  \n\tsoweit scheint das \t\tganze zu? funktionieren.\n\n\n\n\t\t\n\t\t\n\t\t\t      \n\t\t\t      \n    \t\t\t\n    \tNoch ein  Test.\n    \t\n    \t\n    \t";

And I want to reach this state:


That is a test.
seems to work pretty good? working.
Another test again.

String expectedOutput = "Das ist ein Test.\nsoweit scheint das ganze zu? funktionieren.\nNoch ein Test.\n";

Any ideas? Can this be achieved using regexes?

replaceAll("\\s+", " ") is NOT what I'm looking for. If this regex would preserve exactly 1 newline of the ones existing it would be perfect.

I have tried this but this seems suboptimal to me...:

BufferedReader bufReader = new BufferedReader(new StringReader(testContent));
String line = null;
StringBuilder newString = new StringBuilder();
while ((line = bufReader.readLine()) != null) {
    String temp = line.replaceAll("\\s+", " ");
    if (!temp.trim().equals("")) {
        newString.append(temp.trim());
        newString.append("\n");
    }
}
هل كانت مفيدة؟

المحلول

In a single regex (plus a small patch for tabs):

input.replaceAll("^\\s+|\\s+$|\\s*(\n)\\s*|(\\s)\\s*", "$1$2")
     .replace("\t"," ");

The regex looks daunting, but in fact decomposes nicely into these parts that are OR-ed together:

  • ^\s+ – match whitespace at the beginning;
  • \s+$ – match whitespace at the end;
  • \s*(\n)\s* – match whitespace containing a newline, and capture that newline;
  • (\s)\s* – match whitespace, capturing the first whitespace character.

The result will be a match with two capture groups, but only one of the groups may be non-empty at a time. This allows me to replace the match with "$1$2", which means "concatenate the two capture groups."

The only remaining problem is that I can't replace a tab with a space using this approach, so I fix that up with a simple non-regex character replacement.

نصائح أخرى

In 4 steps:

text
    // 1. compress all non-newline whitespaces to single space
    .replaceAll("[\\s&&[^\\n]]+", " ")
    // 2. remove spaces from begining or end of lines
    .replaceAll("(?m)^\\s|\\s$", "")
    // 3. compress multiple newlines to single newlines
    .replaceAll("\\n+", "\n")
    // 4. remove newlines from begining or end of string
    .replaceAll("^\n|\n$", "") 

Why don't you do

String[] lines = split(s,"\n")
String[] noExtraSpaces = removeSpacesInEachLine(lines)
String result = join(noExtraSpaces,"\n")

Don't forget https://softwareengineering.stackexchange.com/questions/10998/what-does-the-jamie-zawinskis-quotation-about-regular-expressions-mean

First replace all new lines with one new line, then replace the spaces but not new lines, last thing, you should remove all white spaces from the beginning of the string:

String test = "      This is              a real\n\n\n\n\n\n\n\n\n test !!\n\n\n   bye";
test = test.replaceAll("\n+", "\n");
test = test.replaceAll("((?!\n+)\\s+)", " ");
test = test.replaceAll("((?!\n+)\\s+)", "");

Output:

This is a real
test !!
bye

If I understand correctly, you simply want to replace a succession of newlines with one newline. So replace \n\n* with \n (with appropriate flags). If there is a lot of whitespace in the lines, simply remove the whitespace (^\s\s*$ with multiline mode) first, then replace the newlines.

Edit: The only issue here is that some newlines might remain here and there, so you have to be careful to first collapse spaces, then fix the empty line problem. You can trim it down further into probably a single regex, but it's easier to read with these three:

 Pattern spaces = Pattern.compile("[\t ]+");
 Pattern emptyLines = Pattern.compile("^\\s+$?", Pattern.MULTILINE);
 Pattern newlines = Pattern.compile("\\s*\\n+");
 System.out.print(
      newlines.matcher(emptyLines.matcher(spaces.matcher(
        input).replaceAll(" ")).replaceAll("")).replaceAll("\n"));
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top