HTMLフォーマッティング用のc＃コードを「トークン化」するよりエレガントな方法を提案できますか？

https://stackoverflow.com/questions/228605

04-07-2019
|

質問

（この質問 F＃コードのリファクタリングについては、1つの賛成票だけでなく、興味深い有用な回答も得られました。SOの32,000以上の質問のうち62件のF＃の質問は哀れなようですので、さらに不承認のリスクを冒します！）

昨日ブロガーのブログに少しのコードを投稿しようとして、このサイトに目を向けました、過去に便利だと思っていました。ただし、ブロガーエディターはすべてのスタイル宣言を食べたため、行き止まりになりました。

だから（他のハッカーと同じように）、私は<！> quot;それはどれほど難しいのだろうか？<！> quot; <！> lt; 100行のF＃で自分のものをロールバックしました。

これは、入力文字列を「トークン」のリストに変換するコードの「肉」です。これらのトークンは、字句解析/解析スタイルのトークンと混同しないでください。私はそれらを簡単に調べましたが、ほとんど何も理解していませんでしたが、元の文字列を保持したいのに対して、トークンはのみトークンを提供することを理解しました。

問題は、これを行うよりエレガントな方法はありますか？入力文字列から各トークン文字列を削除するのに必要なsのn個の再定義は好きではありませんが、コメント、文字列、＃regionディレクティブ（これは#regionディレクティブなどの理由で、文字列を潜在的なトークンに事前に分割することは困難です単語以外の文字を含む）。

//Types of tokens we are going to detect
type Token = 
    | Whitespace of string
    | Comment of string
    | Strng of string
    | Keyword of string
    | Text of string
    | EOF

//turn a string into a list of recognised tokens
let tokenize (s:String) = 
    //this is the 'parser' - should we look at compiling the regexs in advance?
    let nexttoken (st:String) = 
        match st with
        | st when Regex.IsMatch(st, "^\s+") -> Whitespace(Regex.Match(st, "^\s+").Value)
        | st when Regex.IsMatch(st, "^//.*?\r?\n") -> Comment(Regex.Match(st, "^//.*?\r?\n").Value) //this is double slash-style comments
        | st when Regex.IsMatch(st, "^/\*(.|[\r?\n])*?\*/") -> Comment(Regex.Match(st, "^/\*(.|[\r?\n])*?\*/").Value) // /* */ style comments http://ostermiller.org/findcomment.html
        | st when Regex.IsMatch(st, @"^""([^""\\]|\\.|"""")*""") -> Strng(Regex.Match(st, @"^""([^""\\]|\\.|"""")*""").Value) // unescaped = "([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions
        | st when Regex.IsMatch(st, "^#(end)?region") -> Keyword(Regex.Match(st, "^#(end)?region").Value)
        | st when st <> "" -> 
                match Regex.Match(st, @"^[^""\s]*").Value with //all text until next whitespace or quote (this may be wrong)
                | x when iskeyword x -> Keyword(x)  //iskeyword uses Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - a bit fragile...
                | x -> Text(x)
        | _ -> EOF

    //tail-recursive use of next token to transform string into token list
    let tokeneater s = 
        let rec loop s acc = 
            let t = nexttoken s
            match t with
            | EOF -> List.rev acc //return accumulator (have to reverse it because built backwards with tail recursion)
            | Whitespace(x) | Comment(x) 
            | Keyword(x) | Text(x) | Strng(x) -> 
                loop (s.Remove(0, x.Length)) (t::acc)  //tail recursive
        loop s []

    tokeneater s

（誰かが本当に興味があるなら、残りのコードを投稿できてうれしいです）

編集 rel =" nofollow noreferrer ">アクティブパターン、kvbの場合、中央のビットは次のようになります。

let nexttoken (st:String) = 
    match st with
    | Matches "^\s+" s -> Whitespace(s)
    | Matches "^//.*?\r?(\n|$)" s -> Comment(s) //this is double slash-style comments
    | Matches "^/\*(.|[\r?\n])*?\*/" s -> Comment(s)  // /* */ style comments http://ostermiller.org/findcomment.html
    | Matches @"^@?""([^""\\]|\\.|"""")*""" s -> Strng(s) // unescaped regexp = ^@?"([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions
    | Matches "^#(end)?region" s -> Keyword(s) 
    | Matches @"^[^""\s]+" s ->   //all text until next whitespace or quote (this may be wrong)
            match s with
            | IsKeyword x -> Keyword(s)
            | _ -> Text(s)
    | _ -> EOF

解決

次のように、アクティブパターンを使用してRegex.IsMatchとRegex.Matchのペアをカプセル化します。

let (|Matches|_|) re s =
  let m = Regex(re).Match(s)
  if m.Success then
    Some(Matches (m.Value))
  else
    None

次のトークン関数は次のようになります。

let nexttoken (st:String) =         
  match st with        
  | Matches "^s+" s -> Whitespace(s)        
  | Matches "^//.*?\r?\n" s -> Comment(s)
  ...

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow