문제

I'm trying to extract information out of rc-files. In these files, "-chars in strings are escaped by doubling them ("") analog to c# verbatim strings. is ther a way to extract the string?

For example, if I have the following string "this is a ""test""" I would like to obtain this is a ""test"". It also must be non-greedy (very important).

I've tried to use the following regular expression;

"(?<text>[^""]*(""(.|""|[^"])*)*)"

However the performance was awful. I'v based it on the explanation here: http://ad.hominem.org/log/2005/05/quoted_strings.php

Has anybody any idea to cope with this using a regular expression?

도움이 되었습니까?

해결책

You've got some nested repetition quantifiers there. That can be catastrophic for the performance.

Try something like this:

(?<=")(?:[^"]|"")*(?=")

That can now only consume either two quotes at once... or non-quote characters. The lookbehind and lookahead assert, that the actual match is preceded and followed by a quote.

This also gets you around having to capture anything. Your desired result will simply be the full string you want (without the outer quotes).

I do not assert that the outer quotes are not doubled. Because if they were, there would be no way to distinguish them from an empty string anyway.

다른 팁

This turns out to be a lot simpler than you'd expect. A string literal with escaped quotes looks exactly like a bunch of simple string literals run together:

"Some ""escaped"" quotes"

"Some " + "escaped" + " quotes"

So this is all you need to match it:

(?:"[^"]*")+

You'll have to strip off the leading and trailing quotes in a separate step, but that's not a big deal. You would need a separate step anyway, to unescape the escaped quotes (\" or "").

Don't if this is better or worse than m.buettner's (guessing not - he seems to know his stuff) but I thought I'd throw it out there for critique.

"(([^"]+(""[^"]+"")*)*)"

Try this (?<=^")(.*?"{2}.*?"{2})(?="$) it will be maybe more faster, than two previous and without any bugs.

  • Match a " beginning the string
  • Multiple times match a non-" or two "
  • Match a " ending the string

"([^"]|(""))*?"

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top