Question

I'm looking for a clean C++ way to parse a string containing expressions wrapped in ${} and build a result string from the programmatically evaluated expressions.

Example: "Hi ${user} from ${host}" will be evaluated to "Hi foo from bar" if I implement the program to let "user" evaluate to "foo", etc.

The current approach I'm thinking of consists of a state machine that eats one character at a time from the string and evaluates the expression after reaching '}'. Any hints or other suggestions?

Note: boost:: is most welcome! :-)

Update Thanks for the first three suggestions! Unfortunately I made the example too simple! I need to be able examine the contents within ${} so it's not a simple search and replace. Maybe it will say ${uppercase:foo} and then I have to use "foo" as a key in a hashmap and then convert it to uppercase, but I tried to avoid the inner details of ${} when writing the original question above... :-)

Was it helpful?

Solution

#include <iostream>
#include <conio.h>
#include <string>
#include <map>

using namespace std;

struct Token
{
    enum E
    {
        Replace,
        Literal,
        Eos
    };
};

class ParseExp
{
private:
    enum State
    {
        State_Begin,
        State_Literal,
        State_StartRep,
        State_RepWord,
        State_EndRep
    };

    string          m_str;
    int             m_char;
    unsigned int    m_length;
    string          m_lexme;
    Token::E        m_token;
    State           m_state;

public:
    void Parse(const string& str)
    {
        m_char = 0;
        m_str = str;
        m_length = str.size();
    }

    Token::E NextToken()
    {
        if (m_char >= m_length)
            m_token = Token::Eos;

        m_lexme = "";
        m_state = State_Begin;
        bool stop = false;
        while (m_char <= m_length && !stop)
        {
            char ch = m_str[m_char++];
            switch (m_state)
            {
            case State_Begin:
                if (ch == '$')
                {
                    m_state = State_StartRep;
                    m_token = Token::Replace;
                    continue;
                }
                else
                {
                    m_state = State_Literal;
                    m_token = Token::Literal;
                }
                break;

            case State_StartRep:
                if (ch == '{')
                {
                    m_state = State_RepWord;
                    continue;
                }
                else
                    continue;
                break;

            case State_RepWord:
                if (ch == '}')
                {
                    stop = true;
                    continue;
                }
                break;

            case State_Literal:
                if (ch == '$')
                {
                    stop = true;
                    m_char--;
                    continue;
                }
            }

            m_lexme += ch;
        }

        return  m_token;
    }

    const string& Lexme() const
    {
        return m_lexme;
    }

    Token::E Token() const
    {
        return m_token;
    }
};

string DoReplace(const string& str, const map<string, string>& dict)
{
    ParseExp exp;
    exp.Parse(str);
    string ret = "";
    while (exp.NextToken() != Token::Eos)
    {
        if (exp.Token() == Token::Literal)
            ret += exp.Lexme();
        else
        {
            map<string, string>::const_iterator iter = dict.find(exp.Lexme());
            if (iter != dict.end())
                ret += (*iter).second;
            else
                ret += "undefined(" + exp.Lexme() + ")";
        }
    }
    return ret;
}

int main()
{
    map<string, string> words;
    words["hello"] = "hey";
    words["test"] = "bla";
    cout << DoReplace("${hello} world ${test} ${undef}", words);
    _getch();
}

I will be happy to explain anything about this code :)

OTHER TIPS

How many evaluation expressions do intend to have? If it's small enough, you might just want to use brute force.

For instance, if you have a std::map<string, string> that goes from your key to its value, for instance user to Matt Cruikshank, you might just want to iterate over your entire map and do a simple replace on your string of every "${" + key + "}" to its value.

Boost::Regex would be the route I'd suggest. The regex_replace algorithm should do most of your heavy lifting.

If you don't like my first answer, then dig in to Boost Regex - probably boost::regex_replace.

How complex can the expressions get? Are they just identifiers, or can they be actual expressions like "${numBad/(double)total*100.0}%"?

Do you have to use the ${ and } delimiters or can you use other delimiters?

You don't really care about parsing. You just want to generate and format strings with placeholder data in it. Right?

For a platform neutral approach, consider the humble sprintf function. It is the most ubiquitous and does what I am assuming that you need. It works on "char stars" so you are going to have to get into some memory management.

Are you using STL? Then consider the basic_string& replace function. It doesn't do exactly what you want but you could make it work.

If you are using ATL/MFC, then consider the CStringT::Format method.

If you are managing the variables separately, why not go the route of an embeddable interpreter. I have used tcl in the past, but you might try lua which is designed for embedding. Ruby and Python are two other embeddable interpreters that are easy to embed, but aren't quite as lightweight. The strategy is to instantiate an interpreter (a context), add variables to it, then evaluate strings within that context. An interpreter will properly handle malformed input that could lead to security or stability problems for your application.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top