Creating a sentiment analysis tool

Question 1

There is not a exact number to train a classifier. You can have a large dataset where all the data has the same attributes so you classifier will memorize a pattern, or, you can have a no so big dataset with good instances so you classifier will have better results.

You can train the classifier using the sample dataset that they give you in the post and use the cross validation in order to get the best classifier.

After you got the best classifier, you can compare your classifier with the classifier provided in the post and choose the better.

Question 2

Datasets are all different and their content often changes (unpredictably) with time. Sometimes you will find that 100 annotated tweets are enough to reach very good performance, because the language use was uniform. Sometimes, tens of thousands of tweets will not be enough. And just when you think your classifier is good, two days pass and what people talk about and how they talk about it changes. That same classifier is now useless. There is a large body of research on active learning and content analysis in changing data streams. Here and here are some papers to start your research.

PS If possible, use ready-made data sets. From personal experience, data annotation is extremely hard. Tweets are very tedious to read, and after you have stared at them for one hour you will make many mistakes and be bored out of your mind.