Just break the problem down into the parts. There's :
1) recording from the microphone
2) detecting the start times of the tones
3) building up this into a sequence of dots and dashes.
4) translating this into text
None of those seems particularly difficult on its own. 2) and 3) are probably hardest, especially if the speed of the signal varies a lot or if you need to handle errors. So perhaps you could start there with some pre-recorded audio files.