Вопрос

I'm want to use a Regex to split long string for seperated lines. Line can include any possible unicode character. Line is "ending" on dot ("." - one or more) or on new line ("\n").

Example:

This string will be the input:

"line1. line2.. line3... line4.... line5..... line6
\n
line7"

The output:

  • "line1."
  • "line2.."
  • "line3..."
  • "line4...."
  • "line5....."
  • "line6"
  • "line7"
Это было полезно?

Решение

If I understand what you're asking for, you might try a pattern like this:

(?<=\.)(?!\.)|\n

This will split the string on any position which is preceded by a . but not followed by a . or a \n character.

Note that this pattern preserves any whitespace after the dots, for example:

var input = @"line1. line2.. line3... line4.... line5..... line6\nline7";
var output = Regex.Split(input, @"(?<=\.)(?!\.)|\n");

Produces

line1. 
 line2.. 
 line3... 
 line4.... 
 line5..... 
 line6 
line7 

If you'd like to get rid of the whitespace simply change this to:

(?<=\.)(?!\.)\s*|\n

But if you know that the dots will always be followed by whitespace, you can simplify this to:

(?<=\.)\s+|\n

Другие советы

Try this:

String result = Regex.Replace(subject, @"""?(\w+([.]+)?)(?:[\n ]|[""\n]$)+", @"""$1""\n");

/*
"line1."
"line2.."
"line3..."
"line4...."
"line5....."
"line6"
"line7"
*/

Regex Explanation

"?(\w+([.]+)?)(?:[\n ]|["\n]$)+

Match the character “"” literally «"?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 1 «(\w+([.]+)?)»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the regular expression below and capture its match into backreference number 2 «([.]+)?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      Match the character “.” «[.]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:[\n ]|["\n]$)+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match either the regular expression below (attempting the next alternative only if this one fails) «[\n ]»
      Match a single character present in the list below «[\n ]»
         A line feed character «\n»
         The character “ ” « »
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «["\n]$»
      Match a single character present in the list below «["\n]»
         The character “"” «"»
         A line feed character «\n»
      Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

If you want to keep all dots intact and dots will be followed by a empty space, then this could be your regex:

String result = Regex.Replace(t, @".\s", @".\n");

This will be one string. You haven't stated if you want more strings or one as result.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top