Text Parsing and Nested Collection Transposition in F#

https://stackoverflow.com/questions/18340708

25-06-2022
|

Question

I parse data from a csv file that looks like this:

X,..,..,Dx,..,..
Y,..,..,Dy,..,..
X,..,..,Dx,..,..
Y,..,..,Dy,..,..
X,..,..,Dx,..,..
Y,..,..,Dy,..,..

Each row is an element of an array of a type I defined and used with FileHelpers. This probably isn't relevant, but I'm including this incase someone knows a trick I could do at this stage of the process using FileHelpers.

I'm only interested in pairs X,Dx and Y,Dy The data could have more than just X & Y eg.. (X,Dx); (Y,Dy); (Z,Dz); ...

I'll call the number of letters nL

The goal is to get the averages of Dx, Dy, ... for each group by processing an array of all D's which has SUM(nIterations) * nL elements.

I have a list of numbers of iterations:

let nIterations = [2000; 2000; 2000; 1000; 500; 400; 400; 400; 300; 300]

And for each of these numbers, I will have that many "letter groups." So the rows of data of interest for nIterations.[0], are rows 0 to (nIterations.[0] * nL)

To get the rows of interest for nIterations.[i], I make a list "nis" which is the result of a scan operation performed on nIterations.

let nis = List.scan (fun x e -> x + e) 0 nIterations

Then to isolate the nItertions.[i] group ..

let group = Array.sub Ds (nis.[i]*nL) (nIterations.[i]*nL)

Here's the whole thing:

nIterations |> List.mapi (fun i ni ->
    let igroup = Array.sub Ds (nis.[i]*nL) (ni*nL)
    let groupedbyLetter = (chunk nL igroup)

    let sums = seq { for idx in 0..(nL - 1) do
                         let d = seq { for g in groupedbyLetter do 
                                           yield (Seq.head (Seq.skip idx g)) }
                         yield d |> Seq.sum }

    sums |> Seq.map (fun x -> (x / (float ni))) ) |> List.ofSeq

That "chunk" function is one I found on SO:

let rec chunk n xs =
    if Seq.isEmpty xs then Seq.empty
    else
        let (ys,zs) = splitAt n xs
        Seq.append (Seq.singleton ys) (chunk n zs)

I have verified this works, and gets me what I want - a size nL collection of size nIterations.Length collections.

The problem is speed - this only works on small data sets; the sizes I'm working with in the example I've given are too big. It gets "hung" at the chunk function.

So my question is: How do I go about improving the speed of this whole process? (and/or) What is the best (or atleast a better) way to do that "transposition"

I figure I could:

try to rearrange the data as I'm reading it in
try to index the elements directly
try breaking the process into smaller stages or "passes"
???

La solution

I got it.

let averages =
    (nIterations |> List.mapi (fun i ni ->
        let igroup = Array.sub Ds (nis.[i]*nL) (ni*nL)
        let groupedbyLetter = 
            [| for a in 1..nL..igroup.Length do 
                   yield igroup.[(a - 1)..(a - 1)+(nL-1)] |]

        [| for i in 0..(nL - 1) do
               yield [| for j in 0..(groupedbyLetter.Length - 1) do
                            yield groupedbyLetter.[j].[i] |] 
               |> Array.average |]) )

let columns = [| for i in 0..(nL - 1) do
                     yield [| for j in 0..(nIterations.Length - 1) do
                                  yield averages.[j].[i] |] 
                     |]

The "columns" function is just transposing the data again so I can easily print..

               ----Average Ds----
nIterations       X    Y    Z
   2000          0.2  0.7  1.2
    ...          ...  ...  ...
    ...          ...  ...  ...

e.g. averages returns

[[x1,y1,z1,..], [x2,y2,z2,..], ... ]

and columns gives me

[ [x1,x2,..], [y1,y2,..], [z1,z2,..], ...]

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow