I'd just use start >>. everythingUntil end
instead of between start end body
.
The following implementation is relatively close to the logic in the regex:
let maxInt = System.Int32.MaxValue
type LiquidTag = LiquidTag of string * string
let skipTillString str = skipCharsTillString str true maxInt
let skipTillStringOrEof str : Parser<unit, _> =
fun stream ->
let mutable found = false
stream.SkipCharsOrNewlinesUntilString(str, maxInt, &found) |> ignore
Reply(())
let openingBrace = skipString "{%" >>. spaces
let tagName name =
skipString name
>>? nextCharSatisfies (fun c -> c = '%' || System.Char.IsWhiteSpace(c))
let endTag name =
openingBrace >>? (tagName ("end" + name) >>. (spaces >>. skipString "%}"))
let tagPair_afterOpeningBrace name =
tagName name >>. skipTillString "%}"
>>. (manyCharsTill anyChar (endTag name)
|>> fun str -> LiquidTag(name, str))
let skipToOpeningBraceOrEof = skipTillStringOrEof "{%"
let tagPairs =
skipToOpeningBraceOrEof
>>. many (openingBrace
>>. opt ( tagPair_afterOpeningBrace "examplecode"
<|> tagPair_afterOpeningBrace "requiredcode")
.>> skipToOpeningBraceOrEof)
|>> List.choose id
.>> eof
Some notes:
I only parse the two Liquid statements you're interested in. This makes a difference if one of these statements is nested inside a statement you're not interested in. It also has the advantage that no parsers have to be constructed while the parser is running.
I'm using the
>>?
combinator to control when exactly backtracking may occur.The performance of this implementation will not be great, but there are various ways to optimize it if necessary. The slowest component will probably be the
manyCharsTill anyChar (endTag name)
parser, which could be easily replaced with a custom primitive. Themany ... |> List.choose id
intagPairs
could also be easily replaced with a more efficient custom combinator.