문제

To achieve case-insensitive infix operators using OperatorPrecedenceParser, I'm preprocessing the input, parsing it as text delimited by string literals. The text portion is then searched for infix operators that need to be uppercased (to conform to the operator as known to the OPP). The actual parsing then takes place.

My question is, can both phases be combined into a single parser? I tried

// preprocess: Parser<string,_>
// scalarExpr: Parser<ScalarExpr,_>
let filter = (preprocess .>> eof) >>. (scalarExpr .>> eof)

but it fails at the end of the input, seemingly expecting a scalarExpr. The input can be parsed by preprocess and scalarExpr independently, so I'm guessing it's an issue with eof, but I can't seem to get it right. Is this possible?

Here are the other parsers for reference.

let stringLiteral = 
  let subString = manySatisfy ((<>) '"')
  let escapedQuote = stringReturn "\"\"" "\""
  (between (pstring "\"") (pstring "\"") (stringsSepBy subString escapedQuote)) 

let canonicalizeKeywords =
  let keywords = 
    [
      "OR"
      "AND"
      "CONTAINS"
      "STARTSWITH"
      "ENDSWITH"
    ]
  let caseInsensitiveKeywords = HashSet(keywords, StringComparer.InvariantCultureIgnoreCase)
  fun text ->
    let re = Regex(@"([\w][\w']*\w)")
    re.Replace(text, MatchEvaluator(fun m ->
      if caseInsensitiveKeywords.Contains(m.Value) then m.Value.ToUpperInvariant()
      else m.Value))

let preprocess = 
  stringsSepBy 
    ((manySatisfy ((<>) '"')) |>> canonicalizeKeywords) 
    (stringLiteral |>> (fun s -> "\"" + s + "\"")) 
도움이 되었습니까?

해결책

The simplest way to parse case insensitive operators with FParsec's OperatorPrecedenceParser is to add operator definitions for every casing you want to support. If you only need to support short operator names, such as "and" or "or", you could simply add all possible case combinations. If you want to use operator names that are too long for this approach, you might consider only supporting the sane casings, i.e. lowercase, UPPERCASE, camelCase and PascalCase. When you want to support multiple casings, it is usually convenient to write a helper function that automatically generates all the needed casings for you from a standard one.

If you have long operator names and you really want to support all casings, the OperatorPrecedenceParser's dynamic configurability also allows the following approach, which should be easier and more efficient than transforming the input:

  1. Search the input for all case insensitive occurrences of the supported operators. This search shouldn't miss any occurrences, but it's no problem if it finds false positives if e.g. the operator name is used inside a function name or inside a string literal.
  2. Add all unique casings you found in step 1 to the OperatorPrecedenceParser. (Usually there won't be many casings of the same operator.)
  3. Parse the input with the configured OperatorPrecedenceParser.

When you parse multiple inputs, you can keep the OperatorPrecedenceParser instance around and just lazily add new operators casings as you need them.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top