Question

I am learning Python and am trying to use it to perform sentiment analysis. I am following an online tutorial from this link: http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/. I have taken a piece of code as a mapper class, an excerpt of which looks like this:

sentimentDict = {
    'positive': {},
    'negative': {}
}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        for line in f:
            sentimentDict['positive'][line.strip()] = 1

    with open('Sentiment/negative_words.txt', 'r') as f:
        for line in f:
            sentimentDict['negative'][line.strip()] = 1

Here, I can see that a new dictionary is created with two keys, positive and negative, but no values.

Following this, two text files are opened and each line is stripped and mapped to the dictionary.

However, what is the = 1 part for? Why is this required (and if it isn't how could it be removed?)

Was it helpful?

Solution

The loop creates a nested dictionary, and sets all values to 1, presumably to then just use the keys as a way to weed out duplicate values.

You could use sets instead and avoid the = 1 value:

sentimentDict = {}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        sentimentDict['positive'] = {line.strip() for line in f}

    with open('Sentiment/negative_words.txt', 'r') as f:
        sentimentDict['negative'] = {line.strip() for line in f}

Note that you don't even need to create the initial dictionaries; you can create the whole set with one statement, a set comprehension.

If other code does rely on dictionaries with the values being set to 1 (perhaps to update counts at a later stage), it'd be more performant to use the dict.fromkeys() class method instead:

sentimentDict = {}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        sentimentDict['positive'] = dict.fromkeys((line.strip() for line in f), 1)

    with open('Sentiment/negative_words.txt', 'r') as f:
        sentimentDict['negative'] = dict.fromkeys((line.strip() for line in f), 1)

Looking at your source blog article however shows that the dictionaries are only used to do membership testing against the keys, so using sets here is much better and transparent to the rest of the code to boot.

OTHER TIPS

The point is that this is a nested dict. sentimentDict is a dictionary, and sentimentDict['positive'] and sentimentDict['negative'] are dictionaries as well.

In the loadSentiment() function those inner dicts get populated with items. The words are the keys, the values are always 1.

So you get something like this:

{'negative': {'bad': 1,
              'disgusting': 1,
              'horrible': 1},
 'positive': {'amazing': 1, 
              'awesome': 1, 
              'cool': 1}}

My guess regarding the meaning of the value 1 is that these dictionaries are just initialized here, and later these counts may be increased to signify stronger or weaker sentiment.

This is creating a dictionary of dictionaries, so sentimentDict['negative'][/something/] = 1 will presumably create a dictionary that looks like this*:

sentimentDict : {'negative' : { 'some_word' : 1, 'some_other_word' : 1, etc. }}

The values are coming from line.strip(), which presumably produces a list of positive and negative words. Each word becomes a key in the 'negative' and 'positive' dicts, respectively (they should be different files, with different lists produced), and each word has a value of 1. This way, you can easily run through a file, take each word in the file and then plug it into your dictionary and add up the results:

sentiment_count = 0
for word in some_file:
  if word in sentimentDict['negative'].keys():
      sentiment_count += sentimentDict['negative'][word]
  etc.

ONE MORE EDIT Martijn has the answer. I misread strip() as split() (a common mistake of mine).

From the code in the link http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/ you have given, it is storing 1 just as a placeholder for a dictionary key's value

The word itself is the key and its value (=1) is not significant.

A better way would have been to have a dictionary of set or simple list instead of dictionary of dictionary as the link shows.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top