Lazy.. but eager data loader in F#

https://stackoverflow.com/questions/12449100

02-07-2021
|

Question

Does anyone know of 'prior art' regarding the following subject :

I have data that take some decent time to load. they are historical level for various stocks.
I would like to preload them somehow, to avoid the latency when using my app
However, preloading them in one chunk at start makes my app unresponsive first which is not user friendly

So I would like to not load my data.... unless the user is not requesting any and playing with what he already has, in which case I would like to get little by little. So it is neither 'lazy' nor 'eager', more 'lazy when you need' and 'eager when you can', hence the acronym LWYNEWYC.

I have made the following which seems to work, but I just wonder if there is a recognized and blessed approach for such thing ?

let r = LoggingFakeRepo () :> IQuoteRepository
r.getHisto "1" |> ignore  //prints Getting histo for 1 when called

let rc =  RepoCached (r) :> IQuoteRepository
rc.getHisto "1" |> ignore //prints Getting histo for 1 the first time only

let rcc =  RepoCachedEager (r) :> IQuoteRepository
rcc.getHisto "100" |> ignore  //prints Getting histo 1..100 by itself BUT
                              //prints Getting histo 100 immediately when called

And the classes

type IQuoteRepository = 
   abstract getUnderlyings : string seq
   abstract getHisto :  string -> string

type LoggingFakeRepo () =
   interface IQuoteRepository with 
      member x.getUnderlyings = printfn "getting underlyings"
                                [1 .. 100] |> List.map string :> _

      member x.getHisto udl = printfn "getting histo for %A" udl
                              "I am a historical dataset in a disguised party"

type RepoCached (rep : IQuoteRepository) =
   let memoize f =
     let cache = new System.Collections.Generic.Dictionary<_, _>()
     fun x ->
        if cache.ContainsKey(x) then cache.[x]
        else let res = f x
             cache.[x] <- res
             res
   let udls = lazy (rep.getUnderlyings )
   let gethistom = memoize rep.getHisto

   interface IQuoteRepository with 
      member x.getUnderlyings = udls.Force()
      member x.getHisto udl = gethistom udl

type Message = string * AsyncReplyChannel<UnderlyingWrap>
type RepoCachedEager (rep : IQuoteRepository) =
   let udls = rep.getUnderlyings

   let agent = MailboxProcessor<Message>.Start(fun inbox ->
      let repocached = RepoCached (rep) :> IQuoteRepository
      let rec loop l =
         async {  try
                     let timeout = if l|> List.isEmpty  then -1 else 50
                     let! (udl, replyChannel) = inbox.Receive(timeout)
                     replyChannel.Reply(repocached.getHisto udl)
                     do! loop l
                  with 
                  | :? System.TimeoutException -> 
                     let udl::xs = l
                     repocached.getHisto udl |> ignore
                     do! loop xs
          }
      loop (udls |> Seq.toList))

   interface IQuoteRepository with 
      member x.getUnderlyings = udls
      member x.getHisto udl = agent.PostAndReply(fun reply -> udl, reply)

Solution

I like your solution. I think using agent to implement some background loading with a timeout is a great way to go - agents can nicely encapsulate mutable state, so it is clearly safe and you can encode the behaviour you want quite easily.

I think asynchronous sequences might be another useful abstraction (if I'm correct, they are available in FSharpX these days). An asynchronous sequence represents a computation that asynchronously produces more values, so they might be a good way to separate the data loader from the rest of the code.

I think you'll still need an agent to synchronize at some point, but you can nicely separate different concerns using async sequences.

The code to load the data might look something like this:

let loadStockPrices repo = asyncSeq {
  // TODO: Not sure how you detect that the repository has no more data...
  while true do
    // Get next item from the repository, preferably asynchronously!
    let! data = repo.AsyncGetNextHistoricalValue()
    // Return the value to the caller...
    yield data }

This code represents the data loader, and it separates it from the code that uses it. From the agent that consumes the data source, you can use AsyncSeq.iterAsync to consume the values and do something with them.

With iterAsync, the function that you specify as a consumer is asynchronous. It may block (i.e. using Sleep) and when it blocks, the source - that is.your loader - is also blocked. This is quite nice implicit way to control the loader from the code that consumes the data.

A feature that is not in the library yet (but would be useful) is an partially eager evaluator that takes AsyncSeq<'T> and returns a new AsyncSeq<'T> but obtains a certain number of elements from the source as soon as possible and caches them (so that the consumer does not have to wait when it asks for a value, as long as the source can produce values fast enough).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow