Regex: Match character but exclude from pattern

https://stackoverflow.com/questions/23204421

07-07-2023
|

Question

I have a barcode I'm trying to parse via built-in regular expression in our document imaging software which I believe is in .net. These are barcodes on loan documents that include an account number and sub-account number delimited by a dash (-). The most difficult part of this is that as the sub-account number shrinks, the account number is the one that is zero-filled to compensate. Some examples below showing the account/sub-account number starts at position 11 and goes on for 15 characters (including the dash). I need two different regex patterns (one to match the account number before the dash and the other to match the one after). The first 10 zeroes in all examples are actually another field not currently being used. So matching everything before the - will work in the short term but if they decide to begin using that field, it will no longer work. I need some way to parse it that will give me position 11-25 split on the dash. I can include the dash on the sub account number and zeroes on the account number because I have an option to "Remove all leading occurrences of the __ character" within the software. I can automatically remove leading zeroes in the account and the leading dash in the sub account.

0000000000123456789-12345133304302014

account=123456789 sub=12345

00000000000123456789-1234133304302014

account=0123456789 sub=1234

000000000000123456789-123133304302014

account=00123456789 sub=123

0000000000000123456789-12133304302014

account=000123456789 sub=12

00000000000000123456789-1133304302014

account=0000123456789 sub=1

EDIT:

The final working regex syntax is as follows:

account number = [1-9].(?=.-)

sub-account number = (?<=-).(?=(............$))*

Solution

Based on the revised description, this will capture characters 11 to 25 in two separate groups, split on (and not including) the dash. A leading 10 digit field and trailing 12 digit field are discarded.

\d{10}(\d+)-(\d+)\d{12}

If in the future you needed to also capture the leading and trailing fields in their own groups:

(\d{10})(\d+)-(\d+)(\d{12})

If you want you can remove the zero padding on the account number by matching zero or more leading zeros:

(\d{10})0*(\d+)-(\d+)(\d{12})

(These solutions assume the lengths of the first and last fields are fixed.)

OTHER TIPS

How about

(\d+)(?:-)(\d+)

This has two capturing groups separated by "do not capture" hyphen

You may not need the (:?) part - could be that just the - works.

Exact details depend on regex implementation

The bottom example is for Ruby, but if you need another language let me know.

parsed_numbers = account_string.match( /(\d+)-(\d+)/ )
if parsed_numbers
  account_number = parsed_numbers[ 1 ]
  sub_account_number = parsed_numbers[ 2 ]
end

^(\d+)- will match the first half (the account number).

^0*(\d+)- will match the first half if you don't want any zeros.

-(\d+)$ will match the second half (the sub-account).

To capture the account number before dash use:

0*([1-9]\d*)-
And to capture the one after dash use:

-(\d+)
If you want to capture both at once use:

0*([1-9]\d*)-(\d+)

Assumption: Since 0 is used to 'compensate', account number cannot start with 0.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow