How to choose inputs to an artificial network?

Question 1

When it comes to frameworks for ANN, each person will have their own preferences. I recently used Encog framework for implementing an image processing project and found it very easy to implement.

Now, coming to your problem statement, "a learning method that enables minesweepers to avoid colliding with mines" is a very wide scope. What is indeed going to be your input to the ANN? You will have to decide your input based on whether it is going to be implemented on a real robot or in a simulation environment.

It can be clearly inferred that an unsupervised learning can be ruled out if you are trying to implement something like the ALVIN.

In a simulation environment, the best option is if you can somehow form a grid map of the environment based on the simulated sensor data. Then the occupancy grid surrounding the robot can form a good input to the robot's ANN.

If you can't form a grid map (if the data is insufficient), then you should try to feed all the available and relevant sensor data to the ANN. However, they might have to be pre-processed, depending on the modelled sensor noise given by your simulation environment. If you have a camera feed (like the ALVIN model), then you may directly follow their footsteps and train your ANN likewise.

If this is a real robot, then the choices vary considerably, depending upon the robustness and accuracy requirements. I really hope you do not want to build a robust and field-ready minesweeper single-handedly. :) For a small, controlled environment, your options will be very similar to that of a simulated environment, however sensor noise would be nastier and you would have to figure in various special cases into your mission planner. Still, it would be advisable to fuse a few other sensors (LRF, ultrasound etc.) with vision sensors and use it as an input to your planner. If nothing else is available, copy paste the ALVIN system with only a front camera input.

The ANN training methodology will be similar (if using only vision). The output will be right/left/straight etc. Try with 5-7 hidden layer nodes first, since that is what ALVIN uses. Increase it up to 8-10 max. Should work. Use activation functions properly.

Question 2

Given its success in the real world, ALVIN seems like a good system to base yours off of! As the page you linked to discusses, ALVIN essentially receives an image of the road ahead as its input. On a low level, this is achieved through 960 input nodes representing a 30X32 pixel image. The input value for each node is the color saturation of the pixel that that node represents (with 0 being a completely white image and 1 being a completely black image, or something along those lines) (I'm pretty sure the picture is greyscale, although maybe they're using color now, which could be achieved, for instance, by using three input nodes per pixel, one representing red saturation, one representing green, and one blue). Is there a reason that you don't think that this would be a good input for your system too?

For more low level details, see the original paper.