Question

This is related to this question.

Here is a regex: (?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$). It should parse multilined "key: value" pairs, but there is one example that is not correctly parsed.

Could you please help me to modify this original regex.

Example, regex and bug is here (look at the uncolored line): http://regex101.com/r/sH9lP9

ОПИСАНИЕ should be the key

Fолько: РФ: Квартира `в` хорошем ~ 1500 ~`!@#$%^&*'()_+=-\|</>{.}
fdsdf[,]";:? состояние. по - оплате 25000+К/У`

should be the value.

Was it helpful?

Solution

Your regular expression has a space after the colon, which requires there to be a space after colon in the "key: value" pair. If you look at your example, the one line that doesn't match, has a question mark immediately after the colon and not a space, which makes it not match.

You could possibly use one of the following solutions:

(?<key>[^:\s]+): ?(?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)

Question mark added after the space makes it optional, or:

(?<key>[^:\s]+):\s*(?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)

Which eats all the possible whitespace after the colon, which might be best, since it makes it less strict about the usage of whitespace.

Alternatively, if the problem is that the space is required and the whole line should be part of value for the previous key, then you should add space to the subpattern determining the end of value. In other words, you could change it following (add space after ':' at the end):

(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+: |$)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top