Question

I am reading about the Google Prediction API and can't figure out a part of the docs.

From the use cases I am stuck a bit on this part:

Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example: "excited", "OMG! Just had a fabulous day!" "annoying", "OMG! Just had a fabulous day!" If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.

Why would it put "excited":0.6, "annoying":0.2 while there are no more features on excited. Why is excited prefered?

Was it helpful?

Solution

It's not that the tag "excited" is preferred, but a probability that the message should in fact be classified as "excited" and not "annoyed."

Suppose I have 2 classifications for sentiment: "bullish" and "bearish." I then train a model in the Prediction API with even amounts of "bullish" and "bearish" training data. When I submit a message to Prediction API to get the sentiment, it reads the text and assigns a probability both a "bullish" and a "bearish" probability based on the words in the message. The sum of the probabilities will add up to 1.

So again, it's not that one label is preferred to another, but the probability of the message being "excited" is 3 times greater than it being "annoyed."

OTHER TIPS

If you train the model with just those 2 examples, "excited" and "annoying" labels for sentence "OMG! Just had a fabulous day!", the only reasonable results when querying classification for a tweet like this "OMG! Just had a fabulous day!" should be "excited":0.5, "annoying":0.5.

So probably the case is not perfectly explained in Google documentation. I guess they are more focused trying to explain that it is possible to associate 2 different labels with exactly the same sentence.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top