Question

I would like to generate a random text using letter frequencies from a book in a .txt file, so that each new character (string.lowercase + ' ') depends on the previous one.

How do I use Markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

Was it helpful?

Solution

I would like to generate a random text using letter frequencies from a book in a txt file

Consider using collections.Counter to build-up the frequencies when looping over the text file two letters at a time.

How do I use markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

The two statements are equivalent. The Markov chain is what you're doing. The 27 arrays with conditional frequencies is how you're doing it.

Here is some dictionary based code to get you started:

from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange

def pairwise(iterable):
    it = iter(iterable)
    last = next(it)
    for curr in it:
        yield last, curr
        last = curr

valid = set('abcdefghijklmnopqrstuvwxyz ')

def valid_pair((last, curr)):
    return last in valid and curr in valid

def make_markov(text):
    markov = defaultdict(Counter)
    lowercased = (c.lower() for c in text)
    for p, q in ifilter(valid_pair, pairwise(lowercased)):
        markov[p][q] += 1
    return markov

def genrandom(model, n):
    curr = choice(list(model))
    for i in xrange(n):
        yield curr
        if curr not in model:   # handle case where there is no known successor
            curr = choice(list(model))
        d = model[curr]
        target = randrange(sum(d.values()))
        cumulative = 0
        for curr, cnt in d.items():
            cumulative += cnt
            if cumulative > target:
                break

model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))

OTHER TIPS

If each character only depends on the previous character, you could just compute the probabilities for all 27^2 pairs of characters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top