The outlines of what your script should look like:
Extract the tweet ID and the text field from the tweeter stream.
To the ID and text add another field by using flatten and tokenize - tokenize the text into words (you can use a simple white space tokenizer or something fancier NLTK and break each word in a new record.
- Join the output of (2) with your dictionary to tag each word in your tweets as positive, negative or neutral/irrelevant - you might want to use a signed integer value instead of positive/negative so it'll be easier for you to add them up.
- group the result of (3) by tweet ID.
calculate the sum of sentiment per Tweet.
TweetsRaw = LOAD '...' USING JsonLoader(...);
...
Tweets = FOREACH ... GENERATE TweetID, Text;
TokenizedTweets = FOREACH Tweets GENERATE TweetID, Text, FLATTEN(TOKENIZE(Text)) as word;
Dictionary = LOAD '...' as (DictWord: chararray, polarity: int);
Labeled_Words = JOIN TokenizedTweets BY Word, Dictionary BY DictWord;
GroupedSentiment = GROUP Labeled_Words BY TwitterID, Text;
Result = FOREACH GroupedSentiment GENERATE FLATTEN(group), SUM(Labeled_Words.polarity) AS rate;
DUMP Result;