What is the state-of-the-art in unsupervised learning on temporal data?

https://stackoverflow.com/questions/11854154

25-06-2021
|

Question

I'm looking for an overview of the state-of-the-art methods that

find temporal patterns (of arbitrary length) in temporal data
and are unsupervised (no labels).

In other words, given a steam/sequence of (potentially high-dimensional) data, how do you find those common subsequences that best capture the structure in the data.

Any pointers to recent developments or papers (that go beyond HMMs, hopefully) are welcome!
Is this problem maybe well-understood in a more specific application domain, like
- motion capture
- speech processing
- natural language processing
- game action sequences
- stock market prediction?
In addition, are some of these methods general enough to deal with
- highly noisy data
- hierarchical structure
- irregularly spacing on time axis

(I'm not interested in detecting known patterns, nor in classifying or segmenting the sequences.)

Solution

There has been a lot of recent emphasis on non-parametric HMMs, extensions to infinite state spaces, as well as factorial models, explaining an observation using a set of factors rather than a single mixture component.

Here are some interesting papers to start with (just google the paper names):

"Beam Sampling for the Infinite Hidden Markov Model"
"The Infinite Factorial Hidden Markov Model"
"Bayesian Nonparametric Inference of Switching Dynamic Linear Models"
"Sharing features among dynamical systems with beta processes"

The experiments sections these papers discuss applications in text modeling, speaker diarization, and motion capture, among other things.

OTHER TIPS

I don't know the kind of data you are analysing, but I would suggest(from a dynamical systems analysis point of view), to take a look at:

Recurrence plots (easily found googling it)
Time-delay embedding (may unfold potential relationships between the different dimensions of the data) + distance matrix(study neighborhood patterns maybe?)

Note that this is just another way to represent your data, and analyse it based on this new representation. Just a suggestion!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow