Improving the readability of a FParsec parser
Вопрос
I have a hand-written CSS parser done in C# which is getting unmanageable and was trying to do it i FParsec to make it more mantainable. Here's a snippet that parses a css selector element made with regexes:
var tagRegex = @"(?<Tag>(?:[a-zA-Z][_\-0-9a-zA-Z]*|\*))";
var idRegex = @"(?:#(?<Id>[a-zA-Z][_\-0-9a-zA-Z]*))";
var classesRegex = @"(?<Classes>(?:\.[a-zA-Z][_\-0-9a-zA-Z]*)+)";
var pseudoClassRegex = @"(?::(?<PseudoClass>link|visited|hover|active|before|after|first-line|first-letter))";
var selectorRegex = new Regex("(?:(?:" + tagRegex + "?" + idRegex + ")|" +
"(?:" + tagRegex + "?" + classesRegex + ")|" +
tagRegex + ")" +
pseudoClassRegex + "?");
var m = selectorRegex.Match(str);
if (m.Length != str.Length) {
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
string tagName = m.Groups["Tag"].Value;
string pseudoClassString = m.Groups["PseudoClass"].Value;
CssPseudoClass pseudoClass;
if (pseudoClassString.IsEmpty()) {
pseudoClass = CssPseudoClass.None;
} else {
switch (pseudoClassString.ToLower()) {
case "link":
pseudoClass = CssPseudoClass.Link;
break;
case "visited":
pseudoClass = CssPseudoClass.Visited;
break;
case "hover":
pseudoClass = CssPseudoClass.Hover;
break;
case "active":
pseudoClass = CssPseudoClass.Active;
break;
case "before":
pseudoClass = CssPseudoClass.Before;
break;
case "after":
pseudoClass = CssPseudoClass.After;
break;
case "first-line":
pseudoClass = CssPseudoClass.FirstLine;
break;
case "first-letter":
pseudoClass = CssPseudoClass.FirstLetter;
break;
default:
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
}
string cssClassesString = m.Groups["Classes"].Value;
string[] cssClasses = cssClassesString.IsEmpty() ? EmptyArray<string>.Instance : cssClassesString.Substring(1).Split('.');
allCssClasses.AddRange(cssClasses);
return new CssSelectorElement(
tagName.ToLower(),
cssClasses,
m.Groups["Id"].Value,
pseudoClass);
My first attempt yielded this:
type CssPseudoClass =
| None = 0
| Link = 1
| Visited = 2
| Hover = 3
| Active = 4
| Before = 5
| After = 6
| FirstLine = 7
| FirstLetter = 8
type CssSelectorElement =
{ Tag : string
Id : string
Classes : string list
PseudoClass : CssPseudoClass }
with
static member Default =
{ Tag = "";
Id = "";
Classes = [];
PseudoClass = CssPseudoClass.None; }
open FParsec
let ws = spaces
let str = skipString
let strWithResult str result = skipString str >>. preturn result
let identifier =
let isIdentifierFirstChar c = isLetter c || c = '-'
let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = '-'
optional (str "-") >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"
let stringFromOptional strOption =
match strOption with
| Some(str) -> str
| None -> ""
let pseudoClassFromOptional pseudoClassOption =
match pseudoClassOption with
| Some(pseudoClassOption) -> pseudoClassOption
| None -> CssPseudoClass.None
let parseCssSelectorElement =
let tag = identifier <?> "tagName"
let id = str "#" >>. identifier <?> "#id"
let classes = many1 (str "." >>. identifier) <?> ".className"
let parseCssPseudoClass =
choiceL [ strWithResult "link" CssPseudoClass.Link;
strWithResult "visited" CssPseudoClass.Visited;
strWithResult "hover" CssPseudoClass.Hover;
strWithResult "active" CssPseudoClass.Active;
strWithResult "before" CssPseudoClass.Before;
strWithResult "after" CssPseudoClass.After;
strWithResult "first-line" CssPseudoClass.FirstLine;
strWithResult "first-letter" CssPseudoClass.FirstLetter]
"pseudo-class"
// (tag?id|tag?classes|tag)pseudoClass?
pipe2 ((pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Id = id })) |> attempt
<|>
(pipe2 (opt tag)
classes
(fun tag classes ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Classes = classes })) |> attempt
<|>
(tag |>> (fun tag -> { CssSelectorElement.Default with Tag = tag })))
(opt (str ":" >>. parseCssPseudoClass) |> attempt)
(fun selectorElem pseudoClass -> { selectorElem with PseudoClass = pseudoClassFromOptional pseudoClass })
But I'm not really liking how it's shaping up. I was expecting to come up with something easier to understand, but the part parsing (tag?id|tag?classes|tag)pseudoClass? with a few pipe2's and attempt's is really bad.
Came someone with more experience in FParsec educate me on better ways to accomplish this? I'm thinking on trying FSLex/Yacc or Boost.Spirit instead of FParsec is see if I can come up with nicer code with them
Решение
As Mauricio said, if you find yourself repeating code in an FParsec parser, you can always factor out the common parts into a variable or custom combinator. This is one of the great advantages of combinator libraries.
However, in this case you could also simplify and optimize the parser by reorganizing the grammer a bit. You could, for example, replace the lower half of the parseCssSelectorElement
parser with
let defSel = CssSelectorElement.Default
let pIdSelector = id |>> (fun str -> {defSel with Id = str})
let pClassesSelector = classes |>> (fun strs -> {defSel with Classes = strs})
let pSelectorMain =
choice [pIdSelector
pClassesSelector
pipe2 tag (pIdSelector <|> pClassesSelector <|>% defSel)
(fun tagStr sel -> {sel with Tag = tagStr})]
pipe2 pSelectorMain (opt (str ":" >>. parseCssPseudoClass))
(fun sel optPseudo ->
match optPseudo with
| None -> sel
| Some pseudo -> {sel with PseudoClass = pseudo})
By the way, if you want to parse a large number of string constants, it's more efficient to use a dictionary based parsers, like
let pCssPseudoClass : Parser<CssPseudoClass,unit> =
let pseudoDict = dict ["link", CssPseudoClass.Link
"visited", CssPseudoClass.Visited
"hover", CssPseudoClass.Hover
"active", CssPseudoClass.Active
"before", CssPseudoClass.Before
"after", CssPseudoClass.After
"first-line", CssPseudoClass.FirstLine
"first-letter", CssPseudoClass.FirstLetter]
fun stream ->
let reply = identifier stream
if reply.Status <> Ok then Reply(reply.Status, reply.Error)
else
let mutable pseudo = CssPseudoClass.None
if pseudoDict.TryGetValue(reply.Result, &pseudo) then Reply(pseudo)
else // skip to beginning of invalid pseudo class
stream.Skip(-reply.Result.Length)
Reply(Error, messageError "unknown pseudo class")
Другие советы
You could extract some parts of that complex parser to variables, e.g.:
let tagid =
pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
You could also try using an applicative interface, personally I find it easier to use and think than pipe2:
let tagid =
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
<!> opt tag
<*> id