How to do a left join on a non unique column/index in Deedle

Question

This is a great question - I have to admit, there is currently no elegant way to do this with Deedle. Could you please submit an issue to GitHub to make sure we keep track of this and add some solution?

As you say, Deedle does not let you have duplicate values in the keys currently - although, your Pandas solution also does not use duplicate keys - you simply use the fact that Pandas lets you specify the column to use when joining (and I think this would be great addition to Deedle).

Here is one way to do what you wanted - but not very nice. I think using pivoting would be another option (there is a nice pivot table function in the latest source code - not yet on NuGet).

I used groupByRows and nest to turn your data frames into series grouped by the workOrderCode (each item now contains a frame with all rows that have the same work order code):

let workOrders = 
    Frame.ofColumns [ 
        "workOrderCode" =?> Series.ofValues [ 20050; 20051; 20060 ]
        "workOrderDescription" =?> Series.ofValues [ "Door Repair"; "Lift Replacement"; "Window Cleaning" ]]
   |> Frame.groupRowsByInt "workOrderCode"
   |> Frame.nest

let workOrderScores = 
    Frame.ofColumns [ 
       "workOrderCode" => Series.ofValues [ 20050; 20050; 20051 ]
       "runTime" => Series.ofValues [ 20100112; 20100130; 20100215 ] 
       "score" => Series.ofValues [ 100; 120; 80 ]] 
   |> Frame.groupRowsByInt "workOrderCode"
   |> Frame.nest

Now we can join the two series (because their work order codes are the keys). However, then you get one or two data frames for each joined order code and there is quite a lot of work needed to outer join the rows of the two frames:

// Join the two series to align frames with the same work order code
Series.zip workOrders workOrderScores
|> Series.map(fun _ (orders, scores) -> 
    match orders, scores with
    | OptionalValue.Present s1, OptionalValue.Present s2 ->
        // There is a frame with some rows with the specified code in both 
        // work orders and work order scores - we return a cross product of their rows
        [ for r1 in s1.Rows.Values do
          for r2 in s2.Rows.Values do 
          // Drop workOrderCode from one series (they are the same in both)
          // and append the two rows & return that as the result
          yield Series.append r1 (Series.filter (fun k _ -> k <> "workOrderCode") r2) ]
        |> Frame.ofRowsOrdinal
    // If left or right value is missing, we just return the columns
    // that are available (others will be filled with NaN)
    | OptionalValue.Present s, _ 
    | _, OptionalValue.Present s -> s)
|> Frame.unnest
|> Frame.indexRowsOrdinally

This might be slow (especially in the NuGet version). If you work on more data, please try building latest version of Deedle from sources (and if that does not help, please submit an issue - we should look into this!)