Question

I'd like to identify certain values in the following string, specially the values inside CVC and Number:

CreditCard Number="123" CVC="213" Date="2015-12"

(?<=CVC=\").*(?=") matches 213" Date="2015-12. How can I modify the regex to look for the first doublequote match after something was found, and not to look for the last doublequote as it does now?

Further: how can I define wildcards in lookaheads? Ideally I'd like to have an expression: (?<=CreditCard.*CVC=\").*(?=") which means that a CVC statement must be preceded with "CreditCard" String, but between them there could by any values.

Was it helpful?

Solution

You can simply make the .* not greedy .*?

(?<=CVC=\").*?(?=")

RegExr

In answer to your 2nd question, java regex (and most other engines) don't allow variable length lookbehinds. Usually though, you can solve a problem that would require a variable length lookbehind by using capture groups:

(?<=CreditCard.*CVC=\").*?(?=")

becomes:

CreditCard.*?CVC=\"(.*?)"

And then you can take the relevant information from capture group 1. RegExr (.* added on RegExr so that output replaces the entire input, its not required for your case though.)

OTHER TIPS

You could skip using lookbehinds, and instead use clustering to pull out just the portions of the string you want:

CreditCard Number="(/d*)".*\sCVC="(/d*)"

And then the "match groups" numbered 1 and 2 will correspond to your credit card number and CVC, respectively. (You can use Matcher.group(int) to retrieve the values of the various groups) Notice that by using \d to specifically match digits, you don't have to make the * non-greedy. In this case it works because you only want to match on digits. In the general case (let's say a credit card number could consist of any non-quote character), you can use a custom character class to match anything but your delimiter (quote in this case):

CreditCard Number="([^"]*)".*\sCVC="([^"]*)"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top