How to access a text file with Afrikaans language words as a nltk corpus

https://stackoverflow.com/questions/14203767

python
corpus
nltk

14-01-2022
|

Question

I have a text file with plain text sentences in the Afrikaans language. I would like to be able to perform nltk corpus functions on this text file, but can't find any examples of how to do this.

I would like to do things such as:

mytext.concordance("woord")
mytext.similar("woord")

Can anyone help me?

Solution

Managed to figure something out:

# How to load a text file as a corpus.
import nltk
from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.util import LazyCorpusLoader
afrikaans = LazyCorpusLoader('afrikaans', PlaintextCorpusReader, r'(?!\.).*\.txt')
afrikaans.sents()[1]
af = nltk.Text(afrikaans.words())
af.concordance("mense")

This assumes your corpora text file is in C:\nltk_data\corpora\afrikaans\afrikaans.txt

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow