F # / implementazione dell'algoritmo “Accelerator v2” DFT probabilmente errata

https://stackoverflow.com/questions/3128420

30-09-2019
|

Domanda

Sto cercando di sperimentare concetti software defined radio. Da questo articolo ho cercato di implementare una GPU parallelismo Discrete Fourier Transform.

Sono abbastanza sicuro che avrei potuto pre-calcolare 90 gradi delle cos sin (i) (i) e poi basta girare e ripetere piuttosto che quello che sto facendo in questo codice e che ciò sarebbe accelerarlo. Ma finora, non ho nemmeno pensare che sto ricevendo risposte corrette. Un ingresso tutti zeri dà 0 risultato come mi aspetto, ma tutti 0.5 come ingressi dà 78.9985886f (mi aspetto un risultato 0 in questo caso). In sostanza, sto solo generalmente confuso. Non ho i dati di input bene e non so cosa fare con il risultato o come verificare esso.

Questa domanda è legata al mio altro post qui

open Microsoft.ParallelArrays
open System

 // X64MulticoreTarget is faster on my machine, unexpectedly
let target = new DX9Target() // new X64MulticoreTarget()

ignore(target.ToArray1D(new FloatParallelArray([| 0.0f |]))) // Dummy operation to warm up the GPU

let stopwatch = new System.Diagnostics.Stopwatch() // For benchmarking

let Hz = 50.0f
let fStep = (2.0f * float32(Math.PI)) / Hz
let shift = 0.0f // offset, once we have to adjust for the last batch of samples of a stream

// If I knew that the periodic function is periodic 
// at whole-number intervals, I think I could keep 
// shift within a smaller range to support streams 
// without overflowing shift - but I haven't 
// figured that out

//let elements = 8192 // maximum for a 1D array - makes sense as 2^13
//let elements = 7240 // maximum on my machine for a 2D array, but why?
let elements = 7240

// need good data!!
let buffer : float32[,] = Array2D.init<float32> elements elements (fun i j -> 0.5f) //(float32(i * elements) + float32(j))) 

let input = new FloatParallelArray(buffer)
let seqN : float32[,] = Array2D.init<float32> elements elements (fun i j -> (float32(i * elements) + float32(j)))
let steps = new FloatParallelArray(seqN)
let shiftedSteps = ParallelArrays.Add(shift, steps)
let increments = ParallelArrays.Multiply(fStep, steps)
let cos_i = ParallelArrays.Cos(increments) // Real component series
let sin_i = ParallelArrays.Sin(increments) // Imaginary component series

stopwatch.Start()
// From the documentation, I think ParallelArrays.Multiply does standard element by 
// element multiplication, not matrix multiplication
// Then we sum each element for each complex component (I don't understand the relationship 
// of this, or the importance of the generalization to complex numbers)
let real = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, cos_i))).[0]
let imag = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, sin_i))).[0]
printf "%A in " ((real * real) + (imag * imag)) // sum the squares for the presence of the frequency
stopwatch.Stop()

printfn "%A" stopwatch.ElapsedMilliseconds

ignorare (System.Console.ReadKey ())

Soluzione

condivido la sua sorpresa che la vostra risposta non più vicino è a zero. Io suggerirei di scrivere codice ingenuo per eseguire la DFT in F # e vedere se è possibile rintracciare la fonte della discrepanza.

Ecco cosa penso che stai cercando di fare:

let N = 7240
let F = 1.0f/50.0f
let pi = single System.Math.PI

let signal = [| for i in 1 .. N*N -> 0.5f |]

let real = 
  seq { for i in 0 .. N*N-1 -> signal.[i] * (cos (2.0f * pi * F * (single i))) }
  |> Seq.sum

let img = 
  seq { for i in 0 .. N*N-1 -> signal.[i] * (sin (2.0f * pi * F * (single i))) }
  |> Seq.sum

let power = real*real + img*img

Speriamo che si può utilizzare questo codice ingenuo per avere un'intuizione meglio come il codice acceleratore deve comportarsi, che potrebbe guidarvi nel vostro test del codice dell'acceleratore. Tenete a mente che una delle ragioni per la discrepanza può essere semplicemente la precisione dei calcoli - ci sono ~ 52 milioni di elementi nei array, in modo da accumulare un errore totale di 79 non potrebbe in realtà essere troppo male. FWIW, ho un potere di ~ 0.05 quando si esegue il codice di sopra singola precisione, ma un potere di ~ 4e-18 quando si utilizza il codice equivalente con i numeri di doppia precisione.

Altri suggerimenti

Due suggerimenti:

assicurarsi che non stai in qualche modo confondere gradi radianti
provare a farlo sans-il parallelismo, o semplicemente con asyncs s 'F # per il parallelismo

(in F #, se si dispone di una serie di carri

let a : float[] = ...

allora si puo 'aggiungere un passaggio a tutti loro in parallelo' per produrre un nuovo array con

let aShift = a |> (fun x -> async { return x + shift }) 
               |> Async.Parallel |> Async.RunSynchronously

(anche se mi aspetto che questo potrebbe essere più lento che solo facendo un ciclo sincrono).)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow