I have a hand-written CSS parser done in C# which is getting unmanageable and was trying to do it i FParsec to make it more mantainable. Here's a snippet that parses a css selector element made with regexes:

var tagRegex = @"(?<Tag>(?:[a-zA-Z][_\-0-9a-zA-Z]*|\*))";
var idRegex = @"(?:#(?<Id>[a-zA-Z][_\-0-9a-zA-Z]*))";
var classesRegex = @"(?<Classes>(?:\.[a-zA-Z][_\-0-9a-zA-Z]*)+)";
var pseudoClassRegex = @"(?::(?<PseudoClass>link|visited|hover|active|before|after|first-line|first-letter))";
var selectorRegex = new Regex("(?:(?:" + tagRegex + "?" + idRegex + ")|" +
                                 "(?:" + tagRegex + "?" + classesRegex + ")|" +
                                  tagRegex + ")" +
                               pseudoClassRegex + "?");

var m = selectorRegex.Match(str);

if (m.Length != str.Length) {
    cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
    return null;
}

string tagName = m.Groups["Tag"].Value;

string pseudoClassString = m.Groups["PseudoClass"].Value;
CssPseudoClass pseudoClass;
if (pseudoClassString.IsEmpty()) {
    pseudoClass = CssPseudoClass.None;
} else {
    switch (pseudoClassString.ToLower()) {
        case "link":
            pseudoClass = CssPseudoClass.Link;
            break;
        case "visited":
            pseudoClass = CssPseudoClass.Visited;
            break;
        case "hover":
            pseudoClass = CssPseudoClass.Hover;
            break;
        case "active":
            pseudoClass = CssPseudoClass.Active;
            break;
        case "before":
            pseudoClass = CssPseudoClass.Before;
            break;
        case "after":
            pseudoClass = CssPseudoClass.After;
            break;
        case "first-line":
            pseudoClass = CssPseudoClass.FirstLine;
            break;
        case "first-letter":
            pseudoClass = CssPseudoClass.FirstLetter;
            break;
        default:
            cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
            return null;
    }
}

string cssClassesString = m.Groups["Classes"].Value;
string[] cssClasses = cssClassesString.IsEmpty() ? EmptyArray<string>.Instance : cssClassesString.Substring(1).Split('.');
allCssClasses.AddRange(cssClasses);

return new CssSelectorElement(
    tagName.ToLower(),
    cssClasses,
    m.Groups["Id"].Value,
    pseudoClass);

My first attempt yielded this:

type CssPseudoClass =
    | None = 0
    | Link = 1
    | Visited = 2
    | Hover = 3
    | Active = 4
    | Before = 5
    | After = 6
    | FirstLine = 7
    | FirstLetter = 8

type CssSelectorElement = 
    { Tag : string
      Id : string
      Classes : string list    
      PseudoClass : CssPseudoClass } 
with
    static member Default = 
        { Tag = "";
          Id = "";
          Classes = [];
          PseudoClass = CssPseudoClass.None; }

open FParsec

let ws = spaces
let str = skipString
let strWithResult str result = skipString str >>. preturn result

let identifier =
    let isIdentifierFirstChar c = isLetter c || c = '-'
    let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = '-'    
    optional (str "-") >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"

let stringFromOptional strOption =
    match strOption with
    | Some(str) -> str
    | None -> ""

let pseudoClassFromOptional pseudoClassOption =
    match pseudoClassOption with
    | Some(pseudoClassOption) -> pseudoClassOption
    | None -> CssPseudoClass.None

let parseCssSelectorElement =
    let tag = identifier <?> "tagName"
    let id = str "#" >>. identifier <?> "#id"
    let classes = many1 (str "." >>. identifier) <?> ".className"
    let parseCssPseudoClass =
        choiceL [ strWithResult "link" CssPseudoClass.Link;
                  strWithResult "visited" CssPseudoClass.Visited;
                  strWithResult "hover" CssPseudoClass.Hover;
                  strWithResult "active" CssPseudoClass.Active;
                  strWithResult "before" CssPseudoClass.Before;
                  strWithResult "after" CssPseudoClass.After;
                  strWithResult "first-line" CssPseudoClass.FirstLine;
                  strWithResult "first-letter" CssPseudoClass.FirstLetter]
                 "pseudo-class"    
    // (tag?id|tag?classes|tag)pseudoClass?
    pipe2 ((pipe2 (opt tag) 
                  id
                  (fun tag id -> 
                      { CssSelectorElement.Default with 
                          Tag = stringFromOptional tag;
                          Id = id })) |> attempt
           <|>
           (pipe2 (opt tag) 
                  classes
                  (fun tag classes -> 
                      { CssSelectorElement.Default with 
                          Tag = stringFromOptional tag;
                          Classes = classes })) |> attempt
           <|>
           (tag |>> (fun tag -> { CssSelectorElement.Default with Tag = tag })))
           (opt (str ":" >>. parseCssPseudoClass) |> attempt)
           (fun selectorElem pseudoClass -> { selectorElem with PseudoClass = pseudoClassFromOptional pseudoClass })

But I'm not really liking how it's shaping up. I was expecting to come up with something easier to understand, but the part parsing (tag?id|tag?classes|tag)pseudoClass? with a few pipe2's and attempt's is really bad.

Came someone with more experience in FParsec educate me on better ways to accomplish this? I'm thinking on trying FSLex/Yacc or Boost.Spirit instead of FParsec is see if I can come up with nicer code with them

有帮助吗?

解决方案

As Mauricio said, if you find yourself repeating code in an FParsec parser, you can always factor out the common parts into a variable or custom combinator. This is one of the great advantages of combinator libraries.

However, in this case you could also simplify and optimize the parser by reorganizing the grammer a bit. You could, for example, replace the lower half of the parseCssSelectorElement parser with

let defSel = CssSelectorElement.Default

let pIdSelector = id |>> (fun str -> {defSel with Id = str})
let pClassesSelector = classes |>> (fun strs -> {defSel with Classes = strs})

let pSelectorMain = 
     choice [pIdSelector 
             pClassesSelector 
             pipe2 tag (pIdSelector <|> pClassesSelector <|>% defSel)
                   (fun tagStr sel -> {sel with Tag = tagStr})]

pipe2 pSelectorMain (opt (str ":" >>. parseCssPseudoClass))
      (fun sel optPseudo ->
           match optPseudo with
           | None -> sel
           | Some pseudo -> {sel with PseudoClass = pseudo})

By the way, if you want to parse a large number of string constants, it's more efficient to use a dictionary based parsers, like

let pCssPseudoClass : Parser<CssPseudoClass,unit> =
    let pseudoDict = dict ["link", CssPseudoClass.Link
                           "visited", CssPseudoClass.Visited
                           "hover", CssPseudoClass.Hover
                           "active", CssPseudoClass.Active
                           "before", CssPseudoClass.Before
                           "after", CssPseudoClass.After
                           "first-line", CssPseudoClass.FirstLine
                           "first-letter", CssPseudoClass.FirstLetter]        
    fun stream ->
        let reply = identifier stream            
        if reply.Status <> Ok then Reply(reply.Status, reply.Error)
        else 
            let mutable pseudo = CssPseudoClass.None
            if pseudoDict.TryGetValue(reply.Result, &pseudo) then Reply(pseudo)
            else // skip to beginning of invalid pseudo class                   
                stream.Skip(-reply.Result.Length)
                Reply(Error, messageError "unknown pseudo class")

其他提示

You could extract some parts of that complex parser to variables, e.g.:

let tagid = 
    pipe2 (opt tag) 
      id
      (fun tag id -> 
          { CssSelectorElement.Default with 
              Tag = stringFromOptional tag
              Id = id })

You could also try using an applicative interface, personally I find it easier to use and think than pipe2:

let tagid =
    (fun tag id -> 
          { CssSelectorElement.Default with 
              Tag = stringFromOptional tag
              Id = id })
    <!> opt tag
    <*> id
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top