Attention for time-series in neural networks

https://datascience.stackexchange.com/questions/86053

17-12-2020
|

Question

Neural networks in many domains (audio, video, image text/NLP) can achieve great results. In particular in NLP using a mechanism named attention (transformer, BERT) have achieved astonishing results - without manual preprocessing of the data (text documents).

I am interested in applying neural networks to time-series. However, in this domain, it looks like most people apply manual feature engineering by either:

transposing the matrix of events to hold columns for each time observation and a row for each thing (device, patient, ...)
manually generating sliding windows and feeding snippets to an RNN/LSMTM.

Am I overlooking something? Why can't I find people using attention? Wouldn't this be much more convenient (automated)?

Solution

It is an interesting question. I would not completely agree with you though when you say that most time-series models dont use attention. However there is not as much documentation available on the web as there is for other applications.

LSTNet was one of the first papers that proposed using an LSTM + attention mechanism for multivariate forecasting time series. Temporal Pattern Attention for Multivariate Time Series Forecasting by Shun-Yao Shih et al. focused on applying attention specifically attuned for multivariate data.

Attend and Diagnose leverages self attention on medical time series data. This time series data is multivariate and contains information like a patient’s heart rate, SO2, blood pressure, etc.

A good link to further study this would be: https://towardsdatascience.com/attention-for-time-series-classification-and-forecasting-261723e0006d

Further quoting form the above paper: "self-attention and related architectures have led to improvements in several time series forecasting use cases, however, altogether they have not seen widespread adaptation. This likely revolves around several factors such as the memory bottleneck, difficulty encoding positional information, focus on pointwise values, and lack of research around handling multivariate sequences. Additionally, outside of NLP many researchers are not probably not familiar with self-attention and its potential."

I dont completely agree with the last statement, nonetheless I do agree that the benefits of attention have not yet captured the attention of researchers outside of NLP to the extent that it should have

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange