Question

I need to process some legacy data and parse some vaguely formalized text fields. Instead of trying regex matches I was thinking in building some simple grammar definition and have some tool to parse the strings based on that.

Some example data of one of the columns to parse

08-JUL-13 To 09-AUG-13   BREAKFAST  0900 LUNCH  1230

or

08-JUL-13 To 22-AUG-13   LUNCH  1230

or

08 JUL 13 To 16 AUG 13  EAST WARD LUN  0200

So my grammar here would be something like this, what is the correct regex pattern?

DateRange:[DateWithOrWithoutDashes TO DateWithOrWithoutDashes]  {BlaBla}0..* {Break* Time}0..1 {Lun Time}0..1
Was it helpful?

Solution

You can try the following regex:

^(?<start_date>\d{2}-?[A-Z]{3}-?\d{2})\s+To\s+(?<end_date>\d{2}-?[A-Z]{3}-?\d{2})\s+(?:(?<type>[A-Z\s]+?)\s+(?<time>\d{4})\s*)+

Regular expression visualization

Debuggex Demo

C# code sample:

string[] lines = {
                     "08-JUL-13 To 09-AUG-13   BREAKFAST  0900 LUNCH  1230",
                     "08-JUL-13 To 22-AUG-13   LUNCH  1230",
                     "08 JUL 13 To 16 AUG 13  EAST WARD LUN  0200"
                 };
foreach (string line in lines)
{
    Match m = Regex.Match(line, @"^(?<start_date>\d{2}[-\s][A-Z]{3}[-\s]\d{2})\s+To\s+(?<end_date>\d{2}[-\s][A-Z]{3}[-\s]\d{2})\s+(?:(?<type>[A-Z\s]+?)\s+(?<time>\d{4})\s*)+");
    if (m.Success)
    {
        Console.WriteLine("Start date: {0}", m.Groups["start_date"].Value);
        Console.WriteLine("End date: {0}", m.Groups["end_date"].Value);
        for (int i = 0; i < m.Groups["type"].Captures.Count; i++)
        {
            Console.WriteLine("Event type[{0}]: {1}", i, m.Groups["type"].Captures[i].Value);
            Console.WriteLine("Event time[{0}]: {1}", i, m.Groups["time"].Captures[i].Value);
        }
        Console.WriteLine();
    }
}

Output:

Start date: 08-JUL-13
End date: 09-AUG-13
Event type[0]: BREAKFAST
Event time[0]: 0900
Event type[1]: LUNCH
Event time[1]: 1230

Start date: 08-JUL-13
End date: 22-AUG-13
Event type[0]: LUNCH
Event time[0]: 1230

Start date: 08 JUL 13
End date: 16 AUG 13
Event type[0]: EAST WARD LUN
Event time[0]: 0200

OTHER TIPS

This pattern will match all your examples.

([0-9])+(-| )([A-Z])+(-| )([0-9])+(-| )+(To)(-| )+([0-9])+(-| )([A-Z\])+(-| )([0-9])+(( )+([A-Z])+( )+([0-9])+)+
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top