Research in random forest algorithms able to switch data sets

https://datascience.stackexchange.com/questions/9560

16-10-2019
|

Pergunta

I'm curious as to whether research been done into random forests that combine unsupervised with supervised learning in a way allowing a single algorithm to find patterns in, and work with, multiple different data sets. I have googled every possible way to find research on this, and have come up empty. Can anyone point me in the right direction?

Solução

Semi-Supervised Learning

The combination of unsupervised learning and supervised learning is referred to as semi-supervised learning, which is the concept that I believe you are searching for.

Label propagation is often cited when outlining the heuristics of semi-supervised learning. The essence is to employ clustering, but to use a tiny set of known cases in order to derive (or propogate) the labels of the clusters. Hence one is able to use a small set of labeled cases to classify a much larger set of unsupervised data.

Here are some references:

Wikipedia has an entry on the semi-supervised learning.
The scikit learn User Guide is often a useful starting point and has a label propogation routine.
There are, in fact, papers treating semi-supervised random forest models.
Another one here

Hope this helps!

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange