You are describing a standar text classification problem. In this setting, the set of features is a (finite) set of words instead of the Sepal length, width, ...
As a result, each document is represented with respect to all such features (all documents have the same number of features) but most of the values will be zero, creating a very sparse vector.
This is the best way to predict polarity/sentiment but you should improve your knowledge of the topic a bit more. I would suggest a read of Sebastiani's survey on Text Classification.
Regards,