Does reinforcement learning require the help of other learning algorithms?

https://datascience.stackexchange.com/questions/8024

16-10-2019
|

Question

Can't reinforcement learning be used without the help of other learning algorithms like SVM and MLP back propagation? I consulted two papers:

both have used other machine learning methods in the inner loop.

Solution

You do not need additional learning algorithms to perform reinforcement learning in simple systems where you can explore all states. For those, simple iterative Q-learning can do very well - as well as a variety of similar techniques, such as Temporal Difference, SARSA. All these can be used without neural networks, provided your problem is not too big (typically under a few million state/action pairs).

The simplest form of Q-learning just stores and updates a table of <state, action> => <estimated reward> pairs. There is no deeper statistical model inside that. Q-learning relies on estimates of reward from this table in order to take an action and then updates it with a more refined estimate after each action.

Q-learning and related techniques such as Temporal Difference are sometimes called model free. However, this does not refer to the absence of a statistical model such as a neural net. Instead, it means that you do not need to have a model of the system you are learning to optimise available, such as knowing all the probabilities of results and consequences of actions in a game. In model free RL, all learning can be done simply by experiencing the system as an agent (if you do have a model then it may still be used for simulation or planning). When considering whether or not you need a neural network, then the term tabular is used for systems that work with explicit value estimates for every possible state or state/action pair. And the term function approximation is used to describe how a neural network is used in the context of RL.

For large, complex problems, which may even have infinite possible states, it is not feasible to use tabular methods, and you need good generalised value estimates based on some function of the state. In those cases, you can use a neural network to create a function approximator, that can estimate the rewards from similar states to those already seen. The neural network replaces the function of the simple table in tabular Q-Learning. However, the neural network (or other supervised ML algorithm) does not perform the learning process by itself, you still need an "outer" RL method that explores states and actions in order to provide data for the NN to learn.

OTHER TIPS

"Reinforcement Learning" means the result of the learning algorithm is a policy; a function that takes a set of inputs and returns a decision. "Supervised Learning", in contrast, learns a function that returns a prediction. They are different types of task.
Multi-Layer Perceptron and Support Vector Machine are architectures i.e. forms for the learnt function in either case.
There is no reason not to try different architectures on different tasks.

Although not required, it is extremely common to use some approximation scheme once you start working with larger problems.

If you are thinking of value-based model-free RL, a typical problem is that the state-space if huge. Using some approximation scheme is necessary not only to store but also useful if you can generalise and take advantage some structure in your data. It is then any function approximation method (regression, neural networks, etc.) can be beneficial.

For model-based approaches something different happens. You need to build up a model of the environment based on data. Using sampled data, you try to approximate your transition and reward functions so you can later use planning methods. Again, supervised learning is applicable.

I don't think you need prior knowledge like SVM and MLP. In fact, reinforcement is another type of machine learning beside supervised learning, which including SVM, and unsupervised learning, which includes MLP. Reinforcement learning is actually very different from the latter two, as it is learning for interaction (agent-environment interaction). The trade-off between exploration and exploit is the key point.

Markov decision process is the basic framework for reinforcement learning, which is very different from the other two types of learning. I highly recommend the textbook "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto". That's the book I am reading now. The language is very easy to follow and the content is comprehensive.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange