문제

My goal, it's to rate a tweets in Pig Latin. I have 3 list of words to use as dictionnary (positive words, negative words and irrelevant words). I would to rate a list of tweets with this dictionary. I have to analyse each word of a tweet. I have to rate tweets by search "growth in France"

Example :

  • List positive words : {good,positive,great,...}
  • List negative words : {bad,recession,...}
  • List irrelevant words : {Germany, Spain, Hollande, Obama,...}

A tweet : "Growth in France is back again and in Spain too" => analyse for each word : growth => positive, France => positive, again => positive, Spain => irrelevant So this tweet is positive and relevant because positive + positive + positive + irrelevant = positive

I tried to make this script ... Sorry for english

도움이 되었습니까?

해결책

The outlines of what your script should look like:

  1. Extract the tweet ID and the text field from the tweeter stream.

  2. To the ID and text add another field by using flatten and tokenize - tokenize the text into words (you can use a simple white space tokenizer or something fancier NLTK and break each word in a new record.

  3. Join the output of (2) with your dictionary to tag each word in your tweets as positive, negative or neutral/irrelevant - you might want to use a signed integer value instead of positive/negative so it'll be easier for you to add them up.
  4. group the result of (3) by tweet ID.
  5. calculate the sum of sentiment per Tweet.

    TweetsRaw = LOAD '...' USING JsonLoader(...);

    ...

    Tweets = FOREACH ... GENERATE TweetID, Text;

    TokenizedTweets = FOREACH Tweets GENERATE TweetID, Text, FLATTEN(TOKENIZE(Text)) as word;

    Dictionary = LOAD '...' as (DictWord: chararray, polarity: int);

    Labeled_Words = JOIN TokenizedTweets BY Word, Dictionary BY DictWord;

    GroupedSentiment = GROUP Labeled_Words BY TwitterID, Text;

    Result = FOREACH GroupedSentiment GENERATE FLATTEN(group), SUM(Labeled_Words.polarity) AS rate;

    DUMP Result;

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top