On design of the training set: conceptual question

https://datascience.stackexchange.com/questions/58184

02-11-2019
|

Question

I am curious to know how training data should be constructed so that it scales to examples that are not a part of the training data. For example, the problem that I am facing right now is in the application of identifying or distinguishing the frequency response of time series that are generated from different distributions. So I constructed $p$ number of examples each from Gaussian, Uniform, Poisson and a kind or colored noise say pink. The White noise examples (Gaussian, Uniform and Poisson) are labelled as 1 and colored noise as 0. Using Neural Network the classification works fine. Now I wanted to do sensitivity analysis by checking if the trained network can classify white noise from another distribution and also colored noise say red. Both the tests failed. NN failed to classify them. But, as soon as I included the red and the new kind of white noise in the training data and tested on a different trail (time series), the NN could classify it.

QUESTION: This behavior makes me wonder if machine learning algorithm sare incapable of distinguishing examples from different systems eventhough the examples in testing have similar properties as those used in training. In this case, eventhough white noise appears similar but since they are generated from different distribution or say systems the training data must include examples from all the generating mechanism or systems, otherwise in testing the ML model fails to recognize it. Is this the usual behavior?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange