NLTK: Find contexts of size 2k for a word

Question 1

The simplest, nltk-ish way to do this is with nltk.ngrams().

words = nltk.corpus.brown.words()
k = 5
for ngram in nltk.ngrams(words, 2*k+1, pad_left=True, pad_right=True, pad_symbol=" "):
    if ngram[k+1].lower() == "settle":
        print(" ".join(ngram))

pad_left and pad_right ensure that all words get looked at. This is important if you don't let your concordances span across sentences (hence: lots of boundary cases).

If you want to ignore punctuation in the window size, you can strip it before scanning:

words = (w for w in nltk.corpus.brown.words() if re.search(r"\w", w))

Question 2

If you want to use the nltk's functionality, you can use nltk's ConcordanceIndex. In order to base the width of the display on the number of words instead of the number of characters (the latter being the default for ConcordanceIndex.print_concordance), you can merely create a subclass of ConcordanceIndex with something like this:

from nltk import ConcordanceIndex

class ConcordanceIndex2(ConcordanceIndex):
    def create_concordance(self, word, token_width=13):
        "Returns a list of contexts for @word with a context <= @token_width"
        half_width = token_width // 2
        contexts = []
        for i, token in enumerate(self._tokens):
            if token == word:
                start = i - half_width if i >= half_width else 0
                context = self._tokens[start:i + half_width + 1]
                contexts.append(context)
        return contexts

Then you can obtain your results like this:

>>> from nltk.tokenize import wordpunct_tokenize
>>> my_corpus = 'The gerenuk fled frantically across the vast valley, whereas the giraffe merely turned indignantly and clumsily loped away from the valley into the nearby ravine.'  # my corpus
>>> tokens = wordpunct_tokenize(my_corpus)
>>> c = ConcordanceIndex2(tokens)
>>> c.create_concordance('valley')  # returns a list of lists, since words may occur more than once in a corpus
[['gerenuk', 'fled', 'frantically', 'across', 'the', 'vast', 'valley', ',', 'whereas', 'the', 'giraffe', 'merely', 'turned'], ['and', 'clumsily', 'loped', 'away', 'from', 'the', 'valley', 'into', 'the', 'nearby', 'ravine', '.']]

The create_concordance method I created above is based upon the nltk's ConcordanceIndex.print_concordance method, which works like this:

>>> c = ConcordanceIndex(tokens)
>>> c.print_concordance('valley')
Displaying 2 of 2 matches:
                                  valley , whereas the giraffe merely turn
 and clumsily loped away from the valley into the nearby ravine .