문제

Is there a list of stop words that people usually use to remove punctuations and close class words (such as he, she, it) when performing NLP or IR/IE related task?

I have been trying out topic modeling using gibbs sampling for word sense disambiguation and it keeps giving punctuations and close class words high probabilities just because they appear frequently in the corpus. https://github.com/christianscheible/BNB/blob/master/nb_gibbs.py

도움이 되었습니까?

해결책

Did you try googling? The top hits I get either contain stopword lists or are stack overflow posts that link to said lists:

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top