Question

I have a .NET application that makes use of the .NET Regex features to match an EPL label text string. Normally I would use the following: ^[A-Z0-9,]+"(.+)"$ and it would match every line (it captures the text in-between the epl code). However recently the EPL has changed and at the end of every EPL line there is a line feed \x0D\x0A.

So i changed the code pattern to [((\r\n)|(\x0D\x0A))A-Z0-9,]+"(.+)" And now it only picks up the keep out of reach of children and doesn't recognise rest.

How can i match the text between the epl code??

This is the raw EPL i'm trying to match

N 0D0A A230,1,0,2,1,1,N,"Keep out of the reach of children"0D0A A133,26,0,4,1,1,N," FUROSEMIDE TABLETS 40 MG"0D0A A133,51,0,4,1,1,N," ONE IN THE MORNING"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N,""0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N,"19/04/13 28 TABLET(S)"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"DN54 5TZ,Tel:01424 503901"0D0A P1

Was it helpful?

Solution

I think you're looking for the RegexOptions.Multiline option. As in:

Regex myEx = new Regex("^[A-Z0-9,]+\".+?\"$", RegexOptions.Multiline);

Actually, the regular expression should be:

"^[A-Z0-9,]+\".*\"\r?$"

Multiline looks for the newline character, \n. But the file contains \r\n. So it finds the ending quote, sees the $, and looks for the newline. But the file has Windows line endings (\r\n). My modified regex skips over that character if it's there.

If you want to eliminate those characters in your results, make a capture group:

"^([A-Z0-9,]+\".*\")\r?$"

Or, you can filter them by calling Trim on each result:

MatchCollection matches = myEx.Matches(text);
foreach (Match m in matches)
{
    string s = m.Value.Trim();  // removes trailing \r
}

OTHER TIPS

Thanks Jim, I tried your suggestions and it worked...

I used the following...

Dim sText As String = "N 0D0A A230,1,0,2,1,1,N,"Keep out of the reach of children"0D0A A133,26,0,4,1,1,N," FUROSEMIDE TABLETS 40 MG"0D0A A133,51,0,4,1,1,N," ONE IN THE MORNING"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N,""0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N,"19/04/13 28 TABLET(S)"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"CN54 1TZ,Tel:01424 503901"0D0A P1"
Dim sRet As String = String.Empty
Dim sTemp As String = String.Empty
Dim m As Match
Dim grp As System.Text.RegularExpressions.Group

Dim sPattern As String = "^([A-Z0-9,])+\"".*\""\r?$"
Dim sPatternRegex As New Regex(sPattern, RegexOptions.Multiline)
Dim matches As MatchCollection = sPatternRegex.Matches(sText)

For Each m In matches
   ' removes trailing \r
   'Dim s As String = m.Value.Trim()
    sTemp += m.Value.Trim() + vbCrLf
Next

' The previous code detects where the line feeds are, replaces the old one with a standard vbCrLF, then the following code parses it like normal

sPattern = "^[A-Z0-9,]+\""(.+)\""$" 

' Standard WinPrint EPL Label: The parsed version would appear as: ^[A-Z0-9,]+\"(.+)\"$

For Each s As String In sTemp.Split(vbCrLf)
   m = Regex.Match(s.Trim, sPattern)
   grp = m.Groups(1)
   sRet += grp.Value + vbCrLf
Next

Return sRet.Trim
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top