سؤال

I'm trying to teach myself python and natural language processing using an online tutorial

http://www.nltk.org/book/ch01.html#sec-automatic-natural-language-understanding

At the end of each section they give practice questions, and for the first section I've gotten through all but one. This one really stumped me.

In nltk there is a function called set() that gives the set of all vocab in a list with all duplicate words removed.

We have been using sets to store vocabularies. Try the following Python expression: set(sent3) < set(text1). Experiment with this using different arguments to set(). What does it do? Can you think of a practical application for this?

I've been running code with a few different arguments for set, but I just can't see a pattern in the output. Does anybody know what classifies one set as greater than another? And why this might be important?

Thanks!

هل كانت مفيدة؟

المحلول

For sets, < is used to test if a set A is a proper subset of set B. For example,

In [147]: set('ab') < set('abc') 
Out[147]: True

because set('ab') is a proper subset of set('abc'). In contrast,

In [149]: set('abc') < set('abc') 
Out[149]: False

since set('abc') is not a proper subset of itself.

This operator is documented here.

نصائح أخرى

Well it is not the test for subset. I've done the following modifications.

sent3 + ['manoj']
text1.count('manoj')  # returns 0
set(sent3) < set(text1)  # returns True but it is not a subset
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top