Extracting information from unstructured text

Question 1

Could I (temporarily) skip 1,2 and 3 and produce a working, but possibly with a high error rate, implementation ? Which corpus should I use ?

I was also thinking of a pre-process step to correct common spelling mistakes or shortcuts like "yess", "c u" and other abominations. Anything already existing I can take advantage of ?

I would suggest you first have a go at processing standard language text. The pre-processing you refer to is an NLP task in its own right, known as normalization. Here is a resource for Twitter normalization: http://www.ark.cs.cmu.edu/TweetNLP/ , additionally, you can use spell checking, sentence boundary detection, ...

THE question, in a nutshell, is: is my approach at solving this problem correct ? If not, what am I doing wrong ?

If you make abstraction of normalization, I think your approach is valid. With regard to automating the annotation process: you can bootstrap the process by using off-the-shelf components first, after which you correct, retrain, and so on, ... during different iterations. To get acceptable results, you will need to do your steps 2, 3, and 4 a couple of times.

If you are interested in understanding the problem and being able to optimize existing solutions, I would suggest you focus on tools that allow you to develop your own models. If you prioritize getting results over being able to develop your own models, I would recommend looking into existing open source text engineering frameworks such as Gate (https://gate.ac.uk/) UIMA (http://uima.apache.org/) and DKPro (which extends UIMA) (https://code.google.com/p/dkpro-core-asl/). All three frameworks wrap existing components, so you have a wide range of possible solutions.

Question 2

I'd suggesting giving a try to NER and Temporal Normalizer. Here is what I see for your example sentence:

You can try the demo here: http://deagol.cs.illinois.edu:8080/