Question

To summarize: How can I prevent my regex pattern from mistaking segments of strings as being a whole-word variable name? It is replacing letters that are part of a bigger word even though I use word boundaries \b.

What I am trying to do: I am working on a calculator. It has a list of variables, and before passing the expression to the parser I call my function ParseVars() to do a regex_search using the pattern for variable matching. Once it has all the tokens that match my variable pattern, I check to see if that string is in fact in the list of variable names, if so, I replace the string with the variables value. Also, every time a calculation is made in the parser, I define a constant with the name ans1, ans2, and so on.

The problem is: Let's say I have a variable defined, named a, and its value is 6. (By the way I keep track of these in a map<string,double> Vars; When I do ParseVars("ans1") the resulting string is "ans1". Also with ParseVar(), the string ans1+ans2+9 stays the same. The string 9+a becomes 9+6. So, so far my regex works as expected.

BUT, if I do ParseVars("ans1+a"), the resulting string is "6ns1+6". I am confused as to why the word boundaries on my regular expression is only failing if I use the variable, 'a' can always be found in 'ans1', but it only gets replaced if 'a' is alone somewhere else in the string.

What I have: Here is my regex pattern: \b([a-z][a-z0-9_]*)\b Shouldn't this only match whole words? The word boundary works fine until 'a' is alone elsewhere in the string. Maybe it's my ParseVars() function, here is the code:

map<string,double> Vars;

// Variables must be a whole word, start with a letter, and
// optionally have other letters, numbers, and underscores.
sregex VarPattern = sregex::compile("\\b([a-z][a-z0-9_]*)\\b");

string Calculator::ParseVars(string expr) {
    if (Vars.empty()) return expr;

    string newExpr = StrToLower(expr);
    const sregex_iterator End;

    // Loop through all possible variable matches
    for (sregex_iterator i(expr.begin(), expr.end(), VarPattern); i != End; ++i) {
        string name = (*i)[0];

        // If it is a variable
        if (Vars.find(name) != Vars.end()) {
            int rPos = 0;

            // Replace all occurrences of it
            while ((rPos = newExpr.find(name, rPos)) != string::npos) {
                newExpr.replace(
                    rPos, name.length(),
                    lexical_cast<string,double>(Vars[name])
                );
            }
        }
    }

    return newExpr;
}

With a being equal to 6, how can I prevent ans1+a from becoming 6ns1+6 instead of the desired ans1+6?

Was it helpful?

Solution

Well, I found the solution. I'm putting my answer here for anybody who has run across a similar problem.

The problem was I was using a basic string replace AFTER the regex had matched, so the word boundaries worked, it is just that the string replace function was replacing every occurrence of the string regardless of word boundaries. I have to use regex_replace(), here is what I ended up with:

map<string,double> Vars;

// Variables must be a whole word, start with a letter, and
// optionally have other letters, numbers, and underscores.
sregex VarPattern = sregex::compile("\\b([a-z][a-z0-9_]*)\\b");

string Calculator::ParseVars(string expr) {
    if (Vars.empty()) return expr;
    string newExpr = StrToLower(expr);
    const sregex_iterator End;

    // Loop through all possible variable matches
    for (sregex_iterator i(expr.begin(), expr.end(), VarPattern); i != End; ++i) {
        string name = (*i)[0];

        // If it is a variable
        if (Vars.find(name) != Vars.end()) {
            sregex rgxName = sregex::compile("\\b" + name + "\\b");

            // Replace all occurrences of it
            newExpr = xpressive::regex_replace(
                newExpr, rgxName,
                lexical_cast<string,double>(Vars[name])
            );
        }
    }

    return newExpr;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top