Creating a new corpus with NLTK

https://stackoverflow.com/questions/4951751

python
nlp
corpus
nltk

11-11-2019
|

Question

I reckoned that often the answer to my title is to go and read the documentations, but I ran through the NLTK book but it doesn't give the answer. I'm kind of new to Python.

I have a bunch of .txt files and I want to be able to use the corpus functions that NLTK provides for the corpus nltk_data.

I've tried PlaintextCorpusReader but I couldn't get further than:

>>>import nltk
>>>from nltk.corpus import PlaintextCorpusReader
>>>corpus_root = './'
>>>newcorpus = PlaintextCorpusReader(corpus_root, '.*')
>>>newcorpus.words()

How do I segment the newcorpus sentences using punkt? I tried using the punkt functions but the punkt functions couldn't read PlaintextCorpusReader class?

Can you also lead me to how I can write the segmented data into text files?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow