Question

I have a set of input output training data, few samples are

Input  output
[1 0 0 0 0]  [1 0 1 0 0]
[1 1 0 0 1]  [1 1 0 0 0]
[1 0 1 1 0]  [1 1 0 1 0]

and so on. I need to apply standard deviation on the entire output as a threshold. So, I calculate the mean standard deviation for the output. The application is that the model when presented this data should be able to learn and predict the output. There is a condition in my objective function design which is the distance = sum of the sqrt of the euclidean distance between model output and the desired target, corresponding to an input should be less than a threshold.

My question is how should I justify the use of threshold? Is it justified ? I read this article article which says that it is common to take standard deviation as the threshold.

For my case, what does it mean taking the standard deviation of the output of the training data?

Was it helpful?

Solution

There is no intuition/philosophy behind std deviation (or variance), statisticians like these measures purely because they are mathematically easy to work with due to various nice properties. See https://math.stackexchange.com/questions/875034/does-expected-absolute-deviation-or-expected-absolute-deviation-range-exist

There are quite a few other ways to perform various forms of outliar detection, belief revision, etc, but they can be more mathematically challenging to work with.

OTHER TIPS

I am not sure this idea applies. You are looking at the definition of standard deviation for a univariate value, but your output is multivariate. There are multivariate analogs, but, it's not clear why you need to apply it here.

It sounds like you are minimizing squared error, or Euclidean distance, between the output and known correct output. That's fine, and makes me think you're predicting the multivariate output shown here. What is the threshold doing then? input is less than what measure of what from what?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top