Reinforcement learning And POMDP

https://stackoverflow.com/questions/2750608

02-10-2019
|

Question

I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process..
I thought inputs to the NN would be: current state, selected action, result state; The output is a probability in [0,1] (prob. that performing selected action on current state will lead to result state)
In training, I fed the inputs stated before, into the NN, and I taught it the output=1.0 for each case that already occurred.

The problem :
For nearly all test case the output probability is near 0.95.. no output was under 0.9 ! Even for nearly impossible results, it gave that high prob.

PS:I think this is because I taught it happened cases only, but not un-happened ones.. But I can not at each step in the episode teach it the output=0.0 for every un-happened action!

Any suggestions how to over come this problem? Or may be another way to use NN or to implement prob function?

Thanks

Solution

The problem is that the sum over all possible following states has to equal 1. If you construct your network like that, that is not guaranteed. Two possible alternatives come to my mind, where I assume discrete states.

When making a prediction, run the network for each possible following state. Afterwards, normalize by dividing through the sum of all probabilities.
Use one output per possible following state. You can then use a softmax layer (as in classification) and interpret the values which then range from 0 to 1 and sum up to 1 as probabilities.

These two are actually roughly equivalent from a mathematical perspective.

In the case of continuous variables, you will have to assume distributions (e.g. a multivariate Gaussian) and use the parameters of that distribution (e.g. mean and covariance stdev) as outputs.

OTHER TIPS

When fitting the NN you might want to fit a wider range of data, in training is there any data that you want to get fitted to a closer to 0 probability? If there isn't I suspect that you might get poor results. As a first step I'd try choosing some different things in the training data set.

Also how are you training the NN? Have you tried using other methods? How about activation functions, perhaps experiment with using some different ones.

With neural nets I think some trial and error when choosing the model is going to help out. (Sorry if all this isn't specific enough.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow