Question

I am working on a project that takes in comments in a given Facebook page and determines the average happiness rating for it based on users of the page. My question is where would I find a formula, rating, or any other literature that would help me find a way to measure happiness using words and collections of words?

I'm looking for something similar to cosine similarity I suppose, but not to find similar words, but to find the average positivity or happiness related to a word or collection of words.

I'm not entirely sure this is the correct place for this question, but it has to do with data, and big data within Facebook so I am hoping to either get a question or be directed to somewhere that I may be able to find my answer. Thanks in advance for your help.

Was it helpful?

Solution

You should be looking towards Natural Language Processing, specifically at Sentiment Analysis.

The link I provided is a good starting point for learning about sentiment analysis. If this is what you are looking for, it is available as part of Stanford's Core NLP.

OTHER TIPS

These doesn't exist an absolute solution for the your question, but I can suggest you some techniques which may help.
If I'm not wrong, then you're basically trying to interpret the English language into human emotions.
There are complete theories written on the subject, and after doing a little bit of research, I've come up with two things that might prove to be handy in your case.

  1. Affective Computing
    The main idea is, that you collect the verbal data (as much as you can), and you apply machine learning algorithms (classification) on the verbal data to figure out whether the verbal data is related to "happiness", or not. You can think of as many important features as you can. One feature can be, that you can assign a number to each word, like (happy=1, happier=3, happiest=10)...this is just an example. Another feature can be, that you try to figure out "nots" and "negations" in your sentence, because "not happy" will obviously get you to a negative value in your feature. You can think of as many similar linguistic features as you can. After having a reasonable number of calculated features, you can simply apply the algorithms mentioned, and you may eventually get to a point.
  2. Sentiment Analyzer
    I just found this research paper, and it works with Natural Language Processing, and after processing the language, it analyzes the emotions of the person, or the source. You can go through the study by yourself.

Why don't you empirically derive happiness based upon a sampling of the population of your texts (i.e. facebook comments). You can recruit reviewers (either "naive" participants or perhaps experts in psychology/emotion depending upon your needs). Give reviewers some sort of scale to rate texts on - either classify texts as positive, negative, or neutral OR have them rate texts on a likert type scale, e.g., on a scale of 1 to 10, with 10 being max, how happy do you think this text is?). By having multiple reviewers you can set thresholds for reliability thresholds (e.g., 2 out of 3 reviewers must agree on assignment/score; 2 reviewers come to consensus on disagreement, etc.). This data then becomes your labelled training data that you can use to score your remaining data.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top