Question

I have some sequential data (e.g. audio, video, text etc.) and I am using this approach to classify sequences. I am sure there's a name for it, but I can't think of it:

vectors =
    t1,[v1_0....v1_n]
    t2,[v2_0....v2_n]
        :
        :
    tm,[vm_0....vm_n]

where t1..tm are the time offsets and the VM are the feature vectors.

Out of this data, I create oversampling by using different window_size, and step_size

[t0,win_size,step_size, np.median(vectors[frame_id:frame_id+win_size],axis=0)]


win_size = the size of the window to mean/median ahead
step_size = how many columns to move with each step 
np.median(vectors[frame_id:frame_id+win_size],axis=0) = column0wise mean or median across the array as a resultant of the vectors

and use the above to train a classifier and should be to generate predictions at multiple scales (e.g. at a large window size).

Edit: for predictions, from the query clip extract a similar set of vectors and then predictions can be aggregated somehow to generate the "consensus".

Was it helpful?

Solution

I would call that a multi-headed convolutional model.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top