Using preprocessing function with identifier parser in FParsec?

https://stackoverflow.com/questions/9230158

28-04-2021
|

문제

I am using the identifier parser from FParsec to parse the names of variables and functions, which are normally a mixture of Unicode and ASCII characters. But sometimes I have escaped Unicode characters in the beginning (like \u03C0) or within the identifier (like swipe_board\u003A_b). I still can make them parseable using isAsciiIdStart and isAsciiIdContinue options, but I can't define my own custom function for pre-processing before normalization. What could be a solution here?

해결책

The identifier parser internally first parses a string and then passes it to an IdentifierValidator instance for validation. Since the C# IdentifierValidator class is publicly accessible (though not documented), you could easily adapt the identifier parser to your needs (by making the initial string parsing step also recognize the escapes).

The identifier parsing is a bit complicated due to support for UTF-16 surrogate pairs, normalization and the Unicode XID character category, which is not natively supported on .NET. Maybe you only need to support ASCII or UCS-2 identifiers specified in term of character categories supported by CharUnicodeInfo.GetUnicodeCategory, in which case you could probably implement the parsing and validation in just one step using many1Satisfy2 or many1Chars2.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow