Reinforcement Learning is a good machine learning algorithm for your problem.
The basic reinforcement learning model consists of:
- a set of environment states
S
(you have a 2d space discretized in some way, which is the dog's current position, if you want to do continuous 2d-space, you might need a neural network to serve as the value function mapper.) - a set of actions
A
( you mentioned the dog performs sequences of actions, e.g., move, rotate) - rules of transitioning between states ( your dog's position transition can be modeled by FSM)
- rules that determine the scalar immediate reward
r
of a transition (When reaching the target position, you might want to give the dog a big reward, while small rewards are also welcomed at intermediate milestones) - rules that describe what the agent observes. (the dog might have a limited view, for example, only the 4 or 8 neighboring cells are viewable, below figure is an example showing the dog's current position
P
and the 4 neighboring cells that are viewable to the dog.)
To find the optimal policy, you can start with the model-free technique - q-learning.