문제

Myello! So I am looking for a concise, efficient an idiomatic way in F# to parse a file or a string. I have a strong preference to treat the input as a sequence of char (char seq). The idea is that every function is responsible to parse a piece of the input, return the converted text tupled with the unused input and be called by a higher level function that chains the unused input to the following functions and use the results to build a compound type. Every parsing function should therefore have a signature similar to this one: char seq -> char seq * 'a . If, for example, the function's responsibility is simply to extract the first word, then, one approach would be the following:

let parseFirstWord (text: char seq) =
  let rec forTailRecursion t acc =
    let c = Seq.head t
    if c = '\n' then
      (t, acc)
    else
      forTailRecursion (Seq.skip 1 t) (c::acc)
  let rest, reversedWord = forTailRecursion text []
  (rest, List.reverse reversedWord)

Now, of course the main problem with this approach is that it extracts the word in reverse order and so you have to reverse it. Its main advantages however are that is uses strictly functional features and proper tail recursion. One could avoid the reversing of the extracted value while losing tail recursion:

let rec parseFirstWord (text: char seq) =
  let c = Seq.head t
  if c = '\n' then
    (t, [])
  else
    let rest, tail = parseFirstWord (Seq.skip 1 t)
    (rest, (c::tail))

Or use a fast mutable data structure underneath instead of using purely functional features, such as:

let parseFirstWord (text: char seq) =
  let rec forTailRecursion t queue =
    let c = Seq.head t
    if c = '\n' then
      (t, queue)
    else
      forTailRecursion (Seq.skip 1 t) (queue.Enqueu(c))
  forTailRecursion text (new Queue<char>())

I have no idea how to use OO concepts in F# mind you so corrections to the above code are welcome.

Being new to this language, I would like to be guided in terms of the usual compromises that an F# developer makes. Among the suggested approaches and your own, which should I consider more idiomatic and why? Also, in that particular case, how would you encapsulate the return value: char seq * char seq, char seq * char list or evenchar seq * Queue<char>? Or would you even consider char seq * String following a proper conversion?

도움이 되었습니까?

해결책

I would definitely have a look at FSLex. FSYacc, FParsec. However if you just want to tokenize a seq<char> you can use a sequence expression to generate tokens in the right order. Reusing your idea of a recursive inner function, and combinining with a sequence expression, we can stay tail recursive like shown below, and avoid non-idiomatic tools like mutable data structures.

I changed the separator char for easy debugging and the signature of the function. This version produces a seq<string> (your tokens) as result, which is probably easier to consume than a tuple with the current token and the rest of the text. If you just want the first token, you can just take the head. Note that the sequence is generated 'on demand', i.e. the input is parsed only as tokens are consumed through the sequence. Should you need the remainder of the input text next to each token, you can yield a pair in loop instead, but I'm guessing the downstream consumer most likely wouldn't (furthermore, if the input text is itself a lazy sequence, possibly linked to a stream, we don't want to expose it as it should be iterated through only in one place).

let parse (text : char seq) = 
    let rec loop t acc = 
        seq {
            if Seq.isEmpty t then yield acc
            else
                let c, rest = Seq.head t, Seq.skip 1 t
                if c = ' ' then 
                    yield acc
                    yield! loop rest ""
                else yield! loop rest (acc + string c)
        }
    loop text ""

parse "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]

This is not the only 'idiomatic' way of doing this in F#. Every time we need to process a sequence, we can look at the functions made available in the Seq module. The most general of these is fold which iterates through a sequence once, accumulating a state at each element by running a given function. In the example below accumulate is such a function, that progressively builds the resulting sequence of tokens. Since Seq.fold doesn't run the accumulator function on an empty sequence, we need the last two lines to extract the last token from the function's internal accumulator.
This second implementation keeps the nice characteriestics of the first, i.e. tail recursion (inside the fold implementation, if I'm not mistaken) and processing of the input sequence on demand. It also happens to be shorter, albeit a bit less readable probably.

let parse2 (text : char seq) =
    let accumulate (res, acc) c =
        if c = ' ' then (Seq.append res (Seq.singleton acc), "")
        else (res, acc + string c)
    let (acc, last) = text |> Seq.fold accumulate (Seq.empty, "")
    Seq.append acc (Seq.singleton last)

parse2 "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]

다른 팁

One way of lexing/parsing in a way truly unique to F# is by using active patterns. The following simplified example shows the general idea. It can process a calculation string of arbitrary length without producing a stack overflow.

let rec (|CharOf|_|) set = function
    | c :: rest when Set.contains c set -> Some(c, rest)
    | ' ' :: CharOf set (c, rest) -> Some(c, rest)
    | _ -> None

let rec (|CharsOf|) set = function
    | CharOf set (c, CharsOf set (cs, rest)) -> c::cs, rest
    | rest -> [], rest

let (|StringOf|_|) set = function
    | CharsOf set (_::_ as cs, rest) -> Some(System.String(Array.ofList cs), rest)
    | _ -> None

type Token =
    | Int of int
    | Add | Sub | Mul | Div | Mod
    | Unknown

let lex: string -> _ =
    let digits = set ['0'..'9']
    let ops = Set.ofSeq  "+-*/%"

    let rec lex chars =
        seq { match chars with
              | StringOf digits (s, rest) -> yield Int(int s); yield! lex rest
              | CharOf ops (c, rest) -> 
                  let op = 
                      match c with
                      | '+' -> Add | '-' -> Sub | '*' -> Mul | '/' -> Div | '%' -> Mod
                      | _ -> failwith "invalid operator char"
                  yield op; yield! lex rest
              | [] -> ()
              | _ -> yield Unknown }

    List.ofSeq >> lex

lex "1234 + 514 / 500"
// seq [Int 1234; Add; Int 514; Div; Int 500]
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top