Question

I am trying to get the difference between 2 containers but the containers are in a weird structure so I dont know whats the best way to perform a difference on it. One containers type and structure I cannot alter but the others I can(variable delims).

delims = ['on','with','to','and','in','the','from','or']
words = collections.Counter(s.split()).most_common()
# words results in [("the",2), ("a",9), ("diplomacy", 1)]

#I want to perform a 'difference' operation on words to remove all the delims words
descriptive_words = set(words) - set(delims)

# because of the unqiue structure of words(list of tuples) its hard to perform a difference
# on it. What would be the best way to perform a difference? Maybe...

delims = [('on',0),('with',0),('to',0),('and',0),('in',0),('the',0),('from',0),('or',0)]
words = collections.Counter(s.split()).most_common()
descriptive_words = set(words) - set(delims)

# Or maybe
words = collections.Counter(s.split()).most_common()
n_words = []
for w in words:
   n_words.append(w[0])
delims = ['on','with','to','and','in','the','from','or']
descriptive_words = set(n_words) - set(delims)
Was it helpful?

Solution

How about just modifying words by removing all the delimiters?

words = collections.Counter(s.split())
for delim in delims:
    del words[delim]

OTHER TIPS

This I how I would do it:

delims = set(['on','with','to','and','in','the','from','or'])
# ...
descriptive_words = filter(lamdba x: x[0] not in delims, words)

Using the filter method. A viable alternative would be:

delims = set(['on','with','to','and','in','the','from','or'])
# ...
decsriptive_words = [ (word, count) for word,count in words if word not in delims ]

Making sure that the delims are in a set to allow for O(1) lookup.

The simplest answer is to do:

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
// For older versions of python without set literals:
// delims = set(['on','with','to','and','in','the','from','or'])
words = collections.Counter(s.split())

not_delims = {key: value for (key, value) in words.items() if key not in delims}
// For older versions of python without dict comprehensions:
// not_delims = dict(((key, value) for (key, value) in words.items() if key not in delims))

Which gives us:

{'a': 9, 'diplomacy': 1}

An alternative option is to do it pre-emptively:

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
counted_words = collections.Counter((word for word in s.split() if word not in delims))

Here you apply the filtering on the list of words before you give it to the counter, and this gives the same result.

If you're iterating through it anyway why bother converting them to sets?

dwords = [delim[0] for delim in delims]
words  = [word for word in words if word[0] not in dwords]

For performance, you can use lambda functions

filter(lambda word: word[0] not in delim, words)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top