I once wrote my own version of Flex, which generated a set of classes instead of the whole program. Firstly, I had to parse the regular expressions by hand, but when I finally wrote it, I replaced the regular expression parsing mechanism with one generated by the program itself.
Manual parsing of the regular expression is actually quite simple. Firstly, you have to specify the result you want to achieve. For example in my case:
[abc]+test
Is being interpreted as:
[abc]@[abc]*@[t]@[e]@[s]@[t]
Which are actually equivalent (@ is an artificially added concatenation operator).
Then you have to create a set of rules, eg.
'[' spotted:
- (optionally) expect '^' character;
- repeat:
- expect a non-special character;
- If it is not last character and is succeeded by '-', expect another character
- until `]` is spotted
- Return a character set
'(' spotted:
- Return a block-begin
')' spotted:
- Return a block-end
'*' spotted:
- Return a star-operator
'+' spotted:
- Return a plus-operator
'.' spotted:
- Return a whole character set
Any other char spotted:
- Return a character set consisting of this single character
Algorithm written like this will give you a tokenizer - routine, which breaks elements into logical tokens. Then you'll have to process them into an expression tree and that may be solved by implementing a Reverse Polish Notation algorithm.
You can check my parser generator here, though it generates a Delphi code. Unfortunately, readme is in Polish, but there are a few examples inside. Try for instance:
Number=[0-9]+
Operator=[\+\-\*/]
And
SpkParserGenerator -i myfile.regex -mc -sg
By the way, you can generate a parser for yourself and then simply translate it from Delphi to C++, it's actually quite simple even if you don't know Delphi well.
This is a set of rules I used to generate parser for the parser generator:
SetRange=\{([0-9]*,[0-9]+)|([0-9]+,[0-9]*)|([0-9]+)\}
Star=\*
Plus=\+
QMark=\?
CharRange=\[\^?((\\.)|(\#[0-9]{3})|([^\\\#\]]))+\]
AnyChar=\.
EscapedChar=\\.
AsciiChar=\#[0-9]{3}
Char=[^\[\]\{\}\.\(\)\#\*\+\?\|\\]
OpenParenthesis=\(
CloseParenthesis=\)
Alternative=\|