Question

For fun I want to design a convolutional neural net to recognize enemy NPCs in a first person shooter. I have captured 100 jpegs of the npcs as well as 100 jpegs of not-NPCs. I have successfully trained a really simple convNEt to identify NPCs. This was really easy because the game actually highlights the NPCs with a red marker to let humans identify them. Makes it SUPER easy for a machine learning algorthm to find them.

Great , so now I can classify a screenshot of an NPC. The next step is to identify these in a data stream at 60 frames per second. We all know that the stupid little processors inside most cameras have a face detection algorithm that operates in real time. So my i7 with 2 NVIDIA gpus can do this no sweat. So now I have to grab the screen buffer, capture a screen shot, feed it to my conVnet, get the location of the NPC, and then move the mouse cursor to the center of that NPC.

Are there any easy to follow tutorials of running a convolutional neural net on a data stream like this?

Was it helpful?

Solution

I've recently started using OpenCV's python implementation, and I found some good OpenCV tutorials on this website: http://www.pyimagesearch.com/ that I really liked. OpenCV allows you to do Haar Cascades for fast facial recognition (by default it doesn't use a convoluted neural network but an optimized implementation of Ada Boosting that evaluates frames in stages for faster processing). OpenCV converts each frame into a multidimensional numpy tensor/matrix that you can then feed into into your ML algorithm (e.g., in TensorFlow or some other library), although I think most people just use the built-in OpenCV face classifiers. In any case, I believe OpenCV can process up to 70 frames per second, so it should be fast enough for you.

The original paper that invented Haar Cascades: https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf

The OpenCV documentation that further explains Haar Cascades: http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top