Denoising Autoenoders with Variable Length Input

https://datascience.stackexchange.com/questions/6866

16-10-2019
|

Question

I'm working on a problem with data from a continuous real-valued signal. The goal is to use ML to smooth the signal based off of past data. To accomplish this, the signal is windowed into a period that's meaningful within the domain. The problem is that this period is highly variable in length.

I've reviewed this question and this question and neither solve the problem, they are more about how to deal with missing values.

Seeing as denoising autoencoders are based off of matrix multiplication, this presents a serious problem. What is the standard approach in such a situation? Should I define an arbitrary (large) window size, and expand windows that are too small (and vice versa)? Or is there a better approach for variable length inputs?

Solution

Recurrent Neural Networks can deal with variable length data. You might want to have a look at:

Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks.
Christopher Olah: Understanding LSTM Networks.
Hochreiter, Schmidthuber: Long short-term memory.

Another idea (which I have not tested so far and just came to my mind) is using a histogram approach: You could probably make fixed-size windows, get the data from those windows and make it discrete (e.g. vector quantization, k-means). After that, you can make a histogram of how often those vectors appeared.

You could also use HMMs for recognition of variable length data.

Transformations (e.g. Fourier transform from the time domain in the frequency domain) might also come in handy.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange