Question

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).

    let printList l =
        Seq.iter (fun n -> printf "%i" n) l
        printfn ""

    let rec morris s = 
        let next str = seq {
            let cnt = ref 1  // Stack overflow is below when enumerating
            for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
                if cur.[0] <> cur.[1] then
                    yield!( [!cnt ; cur.[0]] )
                    cnt := 0
                incr cnt
        }
        seq {
        yield s
        yield! morris (next s) // tail recursion, no stack overflow
        }

    // "main"
    // Print the nth iteration
    let _ =  [1] |> morris |> Seq.nth 3125 |> printList 

You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).

I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.

The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.

Update Oct '10

After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow

let next str = 
    Seq.append str [0]
    |> Seq.pairwise
    |> Seq.scan (fun (n,_) (c,v) ->
            if (c = v) then (n+1,Seq.empty)
            else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
    |> Seq.collect snd

let morris = Seq.unfold(fun sq -> Some(sq,next sq))

That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.

So this will work

let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;

And this one will get a stackoverflow.

let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;

To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).

Some additional notes:

  • Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
  • The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
  • The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.
Was it helpful?

Solution

You should definitely check out

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html

but I will try to post a more comprehensive answer later.

UPDATE

Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.

The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:

  • it is lazy (evaluation of the nth element will not happen until it is first demanded)
  • it does not recompute (re-evaluation of the nth element on the same object instance will not recompute it - it caches each element after it's first computed)
  • you can 'forget' prefixes (as you 'tail' into the list, the no-longer-referenced prefix is available for garbage collection)

The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.

Without further ado, the code:

// print a lazy list up to some max depth
let rec PrintList n ll =
    match n with
    | 0 -> printfn ""
    | _ -> match ll with
           | LazyList.Nil -> printfn ""
           | LazyList.Cons(x,xs) ->
               printf "%d" x
               PrintList (n-1) xs

// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) = 
    let count = ref 1
    let ll = ref rest
    while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
        ll := LazyList.tl !ll
        incr count
    LazyList.cons !count
        (LazyList.consf cur (fun() ->
            if LazyList.nonempty !ll then
                NextMorris !ll
            else
                LazyList.empty()))

// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
    let rec MakeMorris ll =
        LazyList.consf ll (fun () ->
            let next = NextMorris ll
            MakeMorris next
        )
    MakeMorris s

// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35

UPDATE2

If you just want to count, that's fine too:

let LLLength ll =
    let rec Loop ll acc =
        match ll with
        | LazyList.Cons(_,rest) -> Loop rest (acc+1N)
        | _ -> acc
    Loop ll 0N

let Main() =
    // don't do line below, it leaks
    //let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
    // if we only want to count length, make sure we throw away the only
    // copy as we traverse it to count
    [1] |> LazyList.of_list |> Morris |> Seq.nth 100
        |> LLLength |> printfn "%A" 
Main()    

The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.

OTHER TIPS

I believe there are two main problems here:

  • Laziness is very inefficient so you can expect a lazy functional implementation to run orders of magnitude slower. For example, the Haskell implementation described here is 2,400× slower than the F# I give below. If you want a workaround, your best bet is probably to amortize the computations by bunching them together into eager batches where the batches are produced on-demand.

  • The Seq.append function is actually calling into C# code from IEnumerable and, consequently, its tail call doesn't get eliminated and you leak a bit more stack space every time you go through it. This shows up when you come to enumerate over the sequence.

The following is over 80× faster than your implementation at computing the length of the 50th subsequence but perhaps it is not lazy enough for you:

let next (xs: ResizeArray<_>) =
  let ys = ResizeArray()
  let add n x =
    if n > 0 then
      ys.Add n
      ys.Add x
  let mutable n = 0
  let mutable x = 0
  for i=0 to xs.Count-1 do
    let x' = xs.[i]
    if x=x' then
      n <- n + 1
    else
      add n x
      n <- 1
      x <- x'
  add n x
  ys

let morris =
  Seq.unfold (fun xs -> Some(xs, next xs)) (ResizeArray [1])

The core of this function is a fold over a ResizeArray that could be factored out and used functionally without too much performance degradation if you used a struct as the accumulator.

Just save the previous element that you looked for.

let morris2 data = seq {
    let cnt = ref 0
    let prev = ref (data |> Seq.nth 0)

     for cur in data do
        if cur <> !prev then
            yield! [!cnt; !prev]
            cnt := 1
            prev := cur
        else
            cnt := !cnt + 1

    yield! [!cnt; !prev]
}

let rec morrisSeq2 cur = seq {
    yield cur
    yield! morrisSeq2 (morris2 cur)
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top