你能提出一种更优雅的方法来'标记'用于html格式化的c＃代码吗？

https://stackoverflow.com/questions/228605

04-07-2019
|

题

（此问题关于重构F＃代码给了我一个投票，但也有一些有趣和有用的答案。在32,000+ SO中的62个F＃问题似乎可怜，所以我将冒更多的反对风险！）

我昨天试图在博客博客上发布一些代码，然后转向此网站，我在过去发现有用。然而，博主编辑吃了所有的样式声明，结果证明是死路一条。

所以（就像任何黑客一样），我认为<！>“它有多难？<！>并在<！> lt; 100行F＃中滚动我自己。

这是代码的'meat'，它将输入字符串转换为'tokens'列表。请注意，这些标记不应与lexing / parsing-style标记混淆。我确实简单地看了一下，虽然我几乎什么都不懂，但我明白他们会给我只有代币，而我想保留原来的字符串。

问题是：有更优雅的方式吗？我不喜欢从输入字符串中删除每个标记字符串所需的重新定义，但是由于注释，字符串和#region指令之类的东西，很难事先将字符串拆分为潜在的标记。包含非单词字符。）

//Types of tokens we are going to detect
type Token = 
    | Whitespace of string
    | Comment of string
    | Strng of string
    | Keyword of string
    | Text of string
    | EOF

//turn a string into a list of recognised tokens
let tokenize (s:String) = 
    //this is the 'parser' - should we look at compiling the regexs in advance?
    let nexttoken (st:String) = 
        match st with
        | st when Regex.IsMatch(st, "^\s+") -> Whitespace(Regex.Match(st, "^\s+").Value)
        | st when Regex.IsMatch(st, "^//.*?\r?\n") -> Comment(Regex.Match(st, "^//.*?\r?\n").Value) //this is double slash-style comments
        | st when Regex.IsMatch(st, "^/\*(.|[\r?\n])*?\*/") -> Comment(Regex.Match(st, "^/\*(.|[\r?\n])*?\*/").Value) // /* */ style comments http://ostermiller.org/findcomment.html
        | st when Regex.IsMatch(st, @"^""([^""\\]|\\.|"""")*""") -> Strng(Regex.Match(st, @"^""([^""\\]|\\.|"""")*""").Value) // unescaped = "([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions
        | st when Regex.IsMatch(st, "^#(end)?region") -> Keyword(Regex.Match(st, "^#(end)?region").Value)
        | st when st <> "" -> 
                match Regex.Match(st, @"^[^""\s]*").Value with //all text until next whitespace or quote (this may be wrong)
                | x when iskeyword x -> Keyword(x)  //iskeyword uses Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - a bit fragile...
                | x -> Text(x)
        | _ -> EOF

    //tail-recursive use of next token to transform string into token list
    let tokeneater s = 
        let rec loop s acc = 
            let t = nexttoken s
            match t with
            | EOF -> List.rev acc //return accumulator (have to reverse it because built backwards with tail recursion)
            | Whitespace(x) | Comment(x) 
            | Keyword(x) | Text(x) | Strng(x) -> 
                loop (s.Remove(0, x.Length)) (t::acc)  //tail recursive
        loop s []

    tokeneater s

（如果有人真正感兴趣，我很乐意发布剩下的代码）

修改使用活动模式，中心位看起来像这样，好多了！

let nexttoken (st:String) = match st with | Matches "^\s+" s -> Whitespace(s) | Matches "^//.*?\r?(\n|$)" s -> Comment(s) //this is double slash-style comments | Matches "^/\*(.|[\r?\n])*?\*/" s -> Comment(s) // /* */ style comments http://ostermiller.org/findcomment.html | Matches @"^@?""([^""\\]|\\.|"""")*""" s -> Strng(s) // unescaped regexp = ^@?"([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions | Matches "^#(end)?region" s -> Keyword(s) | Matches @"^[^""\s]+" s -> //all text until next whitespace or quote (this may be wrong) match s with | IsKeyword x -> Keyword(s) | _ -> Text(s) | _ -> EOF

有帮助吗？

解决方案

我使用活动模式来封装Regex.IsMatch和Regex.Match对，如下所示：

let (|Matches|_|) re s = let m = Regex(re).Match(s) if m.Success then Some(Matches (m.Value)) else None

然后你的nexttoken函数看起来像：

let nexttoken (st:String) = match st with | Matches "^s+" s -> Whitespace(s) | Matches "^//.*?\r?\n" s -> Comment(s) ...

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow