Question

I'm trying to write a regular expression that matches any possible way of describing a Purchase Order Number out of OCR data on an invoice. This means that I'm including possibilities such as P.O. or PO. This causes an issue when the data contains a reference to a PO Box, which causes the letters PO in "PO BOX" to match.

I was attempting to use a negative lookahead to cause this to fail, but I'm not sure if I'm doing it right. I would need that possible case to completely fail to match instead of only partially matching. I'm using the .NET flavor of RegEx and here is the expression I'm currently using:

(?!=\s{0,3}[Bb][Oo][Xx])((([Cc]ustomer|[Cc]ust\.?) {0,5})?([Pp]\.? *[Oo]\.? *|[Pp]urchase +[Oo]rder)) *([Nn]um\.?(ber)?|[Nn]o\.?)? *#? *:?

Unfortunately this matches all cases including the letters PO in the case of a PO Box. What can I do to make this possibility fail using only a single regular expression?

Was it helpful?

Solution

The first part of your regex looks incorrect: (?!=\s{0,3} should be (?!\s{0,3}

You are also looking ahead from the wrong place, so you should move this: (?!\s{0,3}[Bb][Oo][Xx]) to just after you have found "PO" or "P.O." etc.

So your regex looks like this:

((([Cc]ustomer|[Cc]ust\.?) {0,5})?([Pp]\.? *[Oo]\.? *(?!\s{0,3}[Bb][Oo][Xx])|[Pp]urchase +[Oo]rder)) *([Nn]um\.?(ber)?|[Nn]o\.?)? *#? *:?

Also do your self a favor - unless you need case sensitivity, use case insensitivity, i.e. RegexOptions.IgnoreCase and a more simple regex:

(((Customer|Cust\.?) {0,5})?(P\.? *O\.? *(?!\s{0,3}BOX)|Purchase +Order)) *(Num\.?(ber)?|No\.?)? *#? *:?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top