Question

I have the following code in a Windows C# form that parses a textbox string. The typical string would look something like:

z5 100c x87.50.

Another example might be:

m5 100c vs z5 100c.

In both examples, I have various attributes that are important, "z5', "100", "c", "x", "87.50". From this string I am feeding it into various functions (these functions are less important).

I found various samples and have the below code, but when I run the code it will only parse the first found expression and not iterate through and display all found expressions. For example, "z5 100c x87.50" returns 5 and c.

The the relevant information from the string: m5 100c vs z5 100c. "m" this is a month symbol. "5" is a year. "100" is a price. "c" is a structure. "vs" is a pricing function or what calls a difference function. "z" is a month symbol. "5" is a year. "100" is a separate price. "c" is a separate structure.

Is there a better method for reading the entire string, then parsing and extracting the relevant information?

private void toolStripButton2_Click(object sender, EventArgs e)
{
    string contract = toolStripTextBox1.ToString();
    string contractConvert = contract.ToLower();

    Regex re = new Regex("c$\\.?|p$\\.?|s$\\.?|f$\\.?|cs\\.?|ps\\.?|vs\\.?|x\\.?");
    Regex rePrice = new Regex("[0-9]{1,4}(\\.[0-9]{1,2})?");

    Match m = re.Match(contractConvert.ToString());
    Match mPrice = rePrice.Match(contract.ToString());

    if (m.Success)
    {
        MessageBox.Show(string.Format("Structure: " + m.Value));
    }
    else
    {
        MessageBox.Show("Structure incorrect!");
    }

    if (mPrice.Success)
    {
        MessageBox.Show(string.Format("Strike: " + mPrice.Value));
    }
    else
    {
        MessageBox.Show("Structure incorrect! Requires a strike.");
    }
}
Was it helpful?

Solution

I believe this regex would help you break up your string into the relevant components:

([A-Za-z]{1,}[0-9.]*|[0-9.]{1,}[A-Za-z]*)

Just use match collections like so:

  string pattern = "([A-Za-z]{1,}[0-9.]*|[0-9.]{1,}[A-Za-z]*)";
  string input = "z5 100c x87.50.";

  MatchCollection matches = Regex.Matches(input, pattern);

  foreach (Match match in matches)
  {
     Console.WriteLine(match.Groups[1].Value);
  }

would give you:

z5
100c
x87.50.

and then you could further analyze as needed.

You could even do it all at once using named match groups to make your life a bit easier... something like:

  string pattern = "(?<price_structure>[0-9.]{1,}[c]{1,})|(?<year_month>[z]{1,}[0-9]{1,})";
  string input = "z5 100c x87.50.";

  MatchCollection matches = Regex.Matches(input, pattern);

  foreach (Match match in matches)
  {
     Console.WriteLine("price-structure: " + match.Groups["price_structure"].Value);
     Console.WriteLine("year-month: " + match.Groups["year_month"].Value);
  }

which would give you:

price-structure:

year-month: z5

price-structure: 100c

year-month:

if you wanted to break this down even further you could do something like (note that usage of + in the below is equivalent to {1,} in above examples):

(?<price>[0-9.]+)(?<structure>[c]+)|(?<year>[zx]+)(?<month>[0-9.]+)

I am separating price/structure and year/month with the or operator | to illustrate how you can keep the groups together in case it is contextually important that, for instance, c would only mean "structure" if first preceded by a price. I have also added x to year to illustrate how you can easily add other characters to the set of viable match characters as PhatWrat points out below.

The new regex will result in:

z5 has 4 groups:
    (price)
    (structure)
    z (year)
    5 (month)
100c has 4 groups:
    100 (price)
    c (structure)
    (year)
    (month)
x87.50. has 4 groups:
    (price)
    (structure)
    x (year)
    87.50. (month)

you can try this out with this online testing site: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

OTHER TIPS

Yes, I would take a look at Irony, a parser which will let you create a syntax tree in few minutes (obviously, you need to add the learning curve!!).

You'll find good samples around the net. For example:

Here's another idea--you can use capture groups. I'm sure there's a more elegant way of doing it, but it would go something like this:

First, set up your captures (I've only included 3 here--you'll need to add whatever else you want to support)

Regex myRe = new Regex(@"m(?<month>\d+)|(?<price>\d+)c|z(?<compMonth>\d+)");

Then use "Matches"

var myMatches = myRe.Matches(stringToSearch);

Finally (and I think this could probably be refactored, for those who are better at regex than I am) go through your matches and look for your groups:

foreach (var v in myMatches)
{
    Match myMatch = v as Match;
    if (myMatch == null)
        continue;
    if(!string.IsNullOrEmpty(myMatch.Groups["month"].Value))
        MessageBox.Show("Month = " + myMatch.Groups["month"].Value);

    if (!string.IsNullOrEmpty(myMatch.Groups["price"].Value))
        MessageBox.Show("Price = " + myMatch.Groups["price"].Value);

    if (!string.IsNullOrEmpty(myMatch.Groups["compMonth"].Value))
        MessageBox.Show("Other Month = " + myMatch.Groups["compMonth"].Value);
}

For input "m5 100c vs z5 100c." your output would be:

Month = 5
Price = 100
Other Month = 5
Price = 100

By the way, I suggest this cheat sheet and regexpal as useful RegEx resources.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top