Question

I have an email that I wish to parse, its body contains stuff like

[Event Type]    HireEmployee
[REQUESTOR] POLM4
[SIN]   092
[Employee Name] JOHN,SMITH
[Existing payroll record]   False
[Existing PERM OA Mnemonic] 

I need to be able to parse out the information after each header to store into a variable.

(\[REQUESTOR\]\t)[a-zA-Z0-9]+

will get me the line

[REQUESTOR] POLM4

but I only want it to return "POLM4"

Thanks

EDIT: I'm doing my testing on http://regexpal.com/

Was it helpful?

Solution

put the stuff you dont want in a non-capture group.

For example, instead of your original expression, you do:

(?:\[REQUESTOR\]\t)([a-zA-Z0-9]+)

http://www.debuggex.com/i/brf8zRxz4OcPCTjb.png

No2 the [REQUESTOR] is in a non-capture group and the rest is in the capture group.

Non-capture groups are groups you want to check, but not have saved.

OTHER TIPS

You can do a positive look behind. Your regex would become for example

(?<=\[REQUESTOR\]\t)[a-zA-Z0-9]+

It uses [REQUESTOR] to match with but does not include it in the match itself.

You need to store your result [REQUESTOR] POLM4 in a variable as var1. and use regular expression on var1 as ^[^\)]*\]. This will remove the characters before ] including ]. So you'll get your required string as POLM4.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top