Domanda

I have a string and I would like to parse it using regular expression. .. indicates the category name and everything after : is the content for that category.

Below is the full string I'm trying to parse:

..NAME: JOHN
..BDAY: 1/1/2010
..NOTE: 1. some note 1
 2. some note 2
 3. some note 3
..DATE: 6/3/2014

I'm trying to parse it so that

(group 1) 
..NAME: JOHN

(group 2)
..BDAY: 1/1/2010

(group 3)
..NOTE: 1. some note 1
 2. some note 2
 3. some note 3

(group 4)
..DATE: 6/3/2014  //a.k.a update date

The regular expression patter I use is

\.\.[A-Z0-9]{2,4}:.*

which makes (group 3) ..NOTE: 1. some note 1 missing the content on second and third line.

How can I modify my pattern so I can get the correct grouping?

È stato utile?

Soluzione

. matches all but newline (in most languages, Ruby is one exception). Use RegexOptions.Singleline in C# (or the s modifier in PCRE).


You will need to make your .* lazy up till the next .. or the end of the string $ so that you don't match everything the first time. Also, . doesn't have any special meaning in a character class..so your expression may end up looking cleaner like this:

[.]{2}[A-Z0-9]{2,4}:.*?(?=[.]{2}|$)

Demos: Regex and C#

Altri suggerimenti

I managed to achieve it with the negative lookahead for [.]{2}:

[.]{2}[A-Z0-9]{2,4}:(.*\n?(?![.]{2}))*
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top