Supervised learning vs reinforcement learning for a simple self driving rc car

https://datascience.stackexchange.com/questions/11126

16-10-2019
|

Question

I'm building a remote-controlled self driving car for fun. I'm using a Raspberry Pi as the onboard computer; and I'm using various plug-ins, such as a Raspberry Pi camera and distance sensors, for feedback on the car's surroundings. I'm using OpenCV to turn the video frames into tensors, and I'm using Google's TensorFlow to build a convoluted neural network to learn road boundaries and obstacles. My main question is, should I use supervised learning to teach the car to drive or should I provide objectives and penalties and do reinforcement learning (i.e, get to point B as fast as possible while not hitting anything and staying within the road boundaries)? Below is a list of the pros and cons that I've come up with.

Supervised learning pros:

The inputs to the learning algorithm are pretty straightforward. The car learns to associate the video frame tensor and sensor distance readings with forward, backward, and angular wheel displacement
I can more or less teach the car to drive exactly how I want (without overfitting, of course)
I've done tons of supervised learning problems before, and this approach seems to comfortably fit my existing skill set

Supervised learning cons:

It's not clear how to teach speed, and the correct speed is pretty arbitrary as long as the car doesn't go so fast that it veers off the road. I suppose I could drive fast during training, but this seems like a crude approach. Maybe I could manually add in a constant variable during training that corresponds to the speed for that training session, and then when the learning algorithm is deployed, I set this variable according to the speed I want?

Reinforcement learning pros:

If I build my car with the specific purpose of racing other people's self driving cars, reinforcement learning seems to be the natural way to tell my car to "get there as fast as possible"
I've read that RL is sometimes used for autonomous drones, so in theory it should be easier in cars because I don't have to worry about up and down

Reinforcement learning cons:

I feel like reinforcement learning would require a lot of additional sensors, and frankly my foot-long car doesn't have that much space inside considering that it also needs to fit a battery, the Raspberry Pi, and a breadboard
The car will behave very erratically at first, so much so that maybe it destroys itself. It might also take an unreasonably long time to learn (e.g., months or years)
I can't incoporate explicit rules later on, e.g., stop at a toy red-light. With supervised learning, I could incorporate numerous SL algorithms (e.g., a Haar Cascade classifier for identifying stoplights) into a configurable rules engine that gets evaluated between each video frame. The rules engine would thus be able to override the driving SL algorithm if it saw a red stoplight even though the stoplight might not have been part of the training of the driving algorithm. RL seems too continous to do this (ie., stop only at the terminal state)
I don't have a lot experience with applied reinforcement learning, although I definitely want to learn it regardless

Solution

I'd suggest you to try a hybrid approach:

First, train your car in supervised fashion by demonstration. Just control it and use your commands as labels. This will let you get all the pros of SL.
Then, fine tune your neural net using reinforcement learning. You don't need extra sensors for that: the rewards may be obtained from distance sensors (larger distances = better) and from the speed itself. This will give you the pros of RL and train your NN to the correct goal of driving fast while avoiding obstacles instead of the goal of imitating you.
Combining both approaches will get you the pros of both SL and RL while avoiding their cons. RL won't start from random behavior, just small gradual deviations from what you tought the NN. A similar approach was applied successfuly by Google DeepMind with AlphaGo.
You can always include explicit rules on top of this. Implement them with high priority and call the NN only when there is no explicit rule for current situation. This is reminiscent of the Subsumption Architecture.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange