Forcing F# type inference on generics and interfaces to stay loose

https://stackoverflow.com/questions/1809405

05-07-2019
|

Question

We're gettin' hairy here. I've tested a bunch of tree-synchronizing code on concrete representations of data, and now I need to abstract it so that it can run with any source and target that support the right methods. [In practice, this will be sources like Documentum, SQL hierarchies, and filesystems; with destinations like Solr and a custom SQL cross-reference store.]

The tricky part is that when I'm recursing down a tree of type T and synchronizing into a tree of type U, at certain files I need to do a "sub-sync" of a second type V to that type U at the current node. (V represents hierarchal structure inside a file...) And the type inference engine in F# is driving me around in circles on this, as soon as I try to add the sub-syncing to V.

I'm representing this in a TreeComparison<'a,'b>, so the above stuff results in a TreeComparison<T,U> and a sub-comparison of TreeComparison<V,U>.

The problem is, as soon as I supply a concrete TreeComparison<V,'b> in one of the class methods, the V type propagates through all of the inferring, when I want that first type parameter to stay generic (when 'a :> ITree). Perhaps there is some typing I can do on the TreeComparison<V,'b> value? Or, more likely, the inference is actually telling me something is inherently broken in the way I'm thinking about this problem.

This was really tricky to compress, but I want to give working code you can paste into a script and experiment with, so there are a ton of types at the beginning... core stuff is right at the end if you want to skip. Most of the actual comparison and recursion across the types via ITree has been chopped because it's unnecessary to see the inference problem that I'm banging my head against.

open System

type TreeState<'a,'b> = //'
  | TreeNew of 'a
  | TreeDeleted of 'b
  | TreeBoth of 'a * 'b

type TreeNodeType = TreeFolder | TreeFile | TreeSection

type ITree =
  abstract NodeType: TreeNodeType
  abstract Path: string
      with get, set

type ITreeProvider<'a when 'a :> ITree> = //'
  abstract Children : 'a -> 'a seq
  abstract StateForPath : string -> 'a

type ITreeWriterProvider<'a when 'a :> ITree> = //'
  inherit ITreeProvider<'a> //'
  abstract Create: ITree -> 'a //'
  // In the real implementation, this supports:
  // abstract AddChild : 'a -> unit
  // abstract ModifyChild : 'a -> unit
  // abstract DeleteChild : 'a -> unit
  // abstract Commit : unit -> unit

/// Comparison varies on two types and takes a provider for the first and a writer provider for the second.
/// Then it synchronizes them. The sync code is added later because some of it is dependent on the concrete types.
type TreeComparison<'a,'b when 'a :> ITree and 'b :> ITree> =
  {
    State: TreeState<'a,'b> //'
    ATree: ITreeProvider<'a> //'
    BTree: ITreeWriterProvider<'b> //'
  }

  static member Create(
                        atree: ITreeProvider<'a>,
                        apath: string,
                        btree: ITreeWriterProvider<'b>,
                        bpath: string) =
      { 
        State = TreeBoth (atree.StateForPath apath, btree.StateForPath bpath)
        ATree = atree
        BTree = btree
      }

  member tree.CreateSubtree<'c when 'c :> ITree>
    (atree: ITreeProvider<'c>, apath: string, bpath: string)
      : TreeComparison<'c,'b> = //'
        TreeComparison.Create(atree, apath, tree.BTree, bpath)

/// Some hyper-simplified state types: imagine each is for a different kind of heirarchal database structure or filesystem
type T( data, path: string ) = class
  let mutable path = path
  let rand = (new Random()).NextDouble
  member x.Data = data
  // In the real implementations, these would fetch the child nodes for this state instance
  member x.Children() = Seq.empty<T>

  interface ITree with
    member tree.NodeType = 
      if rand() > 0.5 then TreeFolder
      else TreeFile
    member tree.Path
      with get() = path
      and set v = path <- v
end

type U(data, path: string) = class
  inherit T(data, path)
  member x.Children() = Seq.empty<U>
end

type V(data, path: string) = class
  inherit T(data, path)
  member x.Children() = Seq.empty<V>
  interface ITree with
    member tree.NodeType = TreeSection
end


// Now some classes to spin up and query for those state types [gross simplification makes these look pretty stupid]
type TProvider() = class
  interface ITreeProvider<T> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new T("documentum", path)
end

type UProvider() = class
  interface ITreeProvider<U> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new U("solr", path)
  interface ITreeWriterProvider<U> with
    member this.Create t =
      new U("whee", t.Path)
end

type VProvider(startTree: ITree, data: string) = class
  interface ITreeProvider<V> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new V(data, path)
end


type TreeComparison<'a,'b when 'a :> ITree and 'b :> ITree> with
  member x.UpdateState (a:'a option) (b:'b option) = 
      { x with State = match a, b with
                        | None, None -> failwith "No state found in either A and B"
                        | Some a, None -> TreeNew a
                        | None, Some b -> TreeDeleted b
                        | Some a, Some b -> TreeBoth(a,b) }

  member x.ACurrent = match x.State with TreeNew a | TreeBoth (a,_) -> Some a | _ -> None
  member x.BCurrent = match x.State with TreeDeleted b | TreeBoth (_,b) -> Some b | _ -> None

  member x.CreateBFromA = 
    match x.ACurrent with
      | Some a -> x.BTree.Create a
      | _ -> failwith "Cannot create B from null A node"

  member x.Compare() =
    // Actual implementation does a bunch of mumbo-jumbo to compare with a custom IComparable wrapper
    //if not (x.ACurrent.Value = x.BCurrent.Value) then
      x.SyncStep()
    // And then some stuff to move the right way in the tree


  member internal tree.UpdateRenditions (source: ITree) (target: ITree) =
    let vp = new VProvider(source, source.Path) :> ITreeProvider<V>
    let docTree = tree.CreateSubtree(vp, source.Path, target.Path)
    docTree.Compare()

  member internal tree.UpdateITree (source: ITree) (target: ITree) =
    if not (source.NodeType = target.NodeType) then failwith "Nodes are incompatible types"
    if not (target.Path = source.Path) then target.Path <- source.Path
    if source.NodeType = TreeFile then tree.UpdateRenditions source target

  member internal tree.SyncStep() =
    match tree.State with
    | TreeNew a     -> 
        let target = tree.CreateBFromA
        tree.UpdateITree a target
        //tree.BTree.AddChild target
    | TreeBoth(a,b) ->
        let target = b
        tree.UpdateITree a target
        //tree.BTree.ModifyChild target
    | TreeDeleted b -> 
        ()
        //tree.BTree.DeleteChild b

  member t.Sync() =
    t.Compare()
    //t.BTree.Commit()


// Now I want to synchronize between a tree of type T and a tree of type U

let pt = new TProvider()
let ut = new UProvider()

let c = TreeComparison.Create(pt, "/start", ut , "/path")
c.Sync()

The problem likely revolves around CreateSubtree. If you comment out either:

The docTree.Compare() line
The tree.UpdateITree calls

and replace them with (), then the inference stays generic and everything is lovely.

This has been quite a puzzle. I've tried moving the "comparison" functions in the second chunk out of the type and defining them as recursive functions; I've tried a million ways of annotating or forcing the typing. I just don't get it!

The last solution I'm considering is making a completely separate (and duplicated) implementation of the comparison type and functions for the sub-syncing. But that's ugly and terrible.

Thanks if you read this far! Sheesh!

Solution

I have not analyzed the code enough to figure out why, but adding

  member internal tree.SyncStep() : unit =
                             //   ^^^^^^

seems to fix it.

EDIT

See also

Why does F# infer this type?

Understanding F# Value Restriction Errors

Unknown need for type annotation or cast

It takes experience to get a very deep understanding of the F# type inference algorithm's capabilities and limitations. But this example seems to be in a class of issues people run into when they do very advanced things. For members of a class, the F# inference algorithm does something like

Look at all the member explicit signatures to set up an initial type environment for all the members
For any members that have fully explicit signatures, fix their types to the explicit signature
Start reading the method bodies top to bottom, left to right (you'll encounter some 'forward references' that may involved unsolved type variables when doing this, and that can cause trouble, because...)
Solve all the member bodies concurrently (... but we have not done any 'generalization' yet, the part that would 'infer type parameters' rather than 'fix' what in theory could be a function of 'a to be whatever concrete type its first call site used)
Generalize (any remaining unsolved type variables become actual inferred type variables of generic methods)

That may not be exactly right; I don't know it well enough to describe the algorithm, I just have a sense of it. You can always go read the language spec.

What often happens is you get as far as bullet 3 and forcing the inferencer to start trying to concurrently solve/constrain all the method bodies when in fact it's unnecessary because, e.g. maybe some function has an easy concrete fixed type. Like SyncStep is unit->unit, but F# doesn't know it yet in step 3, since the signature was not explicit, it just says ok SyncStep has type "unit -> 'a" for some yet-unsolved type 'a and then now SyncStep is now unnecessarily complicating all the solving by introducing an unnecessary variable.

The way I found this was, the first warning (This construct causes code to be less generic than indicated by the type annotations. The type variable 'a has been constrained to be type 'V') was on the last line of the body of UpdateRenditions at the call to docTree.Compare(). Now I know that Compare() should be unit -> unit. So how could I possibly be getting a warning about generic-ness there? Ah, ok, the compiler doesn't know the return type is unit at that point, so it must thing that something is generic that's not. In fact, I could have added the return type annotation to Compare instead of SyncStep - either one works.

Anyway, I'm being very long-winded. To sum up

if you have a well-type program, it should 'work'
sometimes the details of the inference algorithm will require some 'extra' annotations... in the worst case you can 'add them all' and then 'subtract away the unnecessary ones'
by using the compiler warnings and some mental model of the inference algorithm, you can quickly steer towards the missing annotation with experience
very often the 'fix' is just to add one full type signature (including return type) to some key method that is 'declared late' but 'called early' (introducing a forward reference among the set of members)

Hope that helps!

OTHER TIPS

This is an old post, but it was the #1 result for my search. I have something to add that may help anyone else struggling with type inference as I (and the OP) have.

I've found it helps to think of inference as some exponential function of the structure of your function calls, what signatures those calls may have, and what signatures they may not have. It's very important to take all three into consideration.

Just for kicks, consider this function with three variables: sqrt(2*2*3)

Right away, it's obvious that it will simplify to 2 times some irrational number, which must be rounded (thus acquiring an infinite level of imprecision) to make it useful in everyday life.

The F# version feeds back into itself, compounding the error until the "rounding" culminates into undesirable type inferences. Because what a type may or may -not- be is a factor into this equation, it isn't always possible/easy to solve the problem directly with type annotations.

Now imagine that adding an additional perfectly generic (i.e. neutral) functional between the two problem functions, changing our equation to this: sqrt(2*2*4)

Suddenly, the result is perfectly rational, producing a perfectly accurate value of 4. By contrast, modifying the inversely related first and second values by 1 would have done absolutely nothing to help us.

Don't be afraid to modify structure if it has the potential to make or break your entire program. One extra function versus all the hoops you'd have to jump through to (continually) bend F# to your will is a very small price to pay, and chances are you can find a way to make the extra structure useful. In some situations, doing the above can turn a very, very, very argumentative program into a perfect little angel, for many functions to come.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow