Java - Taking character frequencies, creating probabilities, and then generating pseudo-random characters

StackOverflow https://stackoverflow.com/questions/13462486

  •  30-11-2021
  •  | 
  •  

Question

I'm creating a pseudo-random text generator using a Markov model. Basically, I use a hash table to store lists of substrings of order k(the order of the Markov model), then for each substring I have a TreeMap of the suffixes with their frequencies throughout the substring.

I'm struggling with generating the random suffix. For each substring, I have a TreeMap containing all of the possible suffixes and their frequencies. I'm having trouble with using this to create a probability for each suffix, and then generating a pseudo-random suffix based on the probabilities.

Any help on the concept of this and how to go about doing this is appreciated. If you have any questions or need clarification, please let me know.

Was it helpful?

Solution

I'm not sure that a TreeMap is really the best data-structure for this, but . . .

You can use the Math.random() method to obtain a random value between 0.0 (inclusive) and 1.0 (exclusive). Then, iterate over the elements of your map, accumulating their frequencies, until you surpass that value. The suffix that first surpasses this value is your result. Assuming that your map-elements' frequencies all add up to 1.0, this will choose all suffixes in proportion to their frequencies.

For example:

public class Demo
{
    private final Map<String, Double> suffixFrequencies =
        new TreeMap<String, Double>();

    private String getRandomSuffix()
    {
        final double value = Math.random();
        double accum = 0.0;
        for(final Map.Entry<String, Double> e : suffixFrequencies.entrySet())
        {
            accum += e.getValue();
            if(accum > value)
                return e.getKey();
        }
        throw new AssertionError(); // or something
    }

    public static void main(final String... args)
    {
        final Demo demo = new Demo();
        demo.suffixFrequencies.put("abc", 0.3);  // value in [0.0, 0.3)
        demo.suffixFrequencies.put("def", 0.2);  // value in [0.3, 0.5)
        demo.suffixFrequencies.put("ghi", 0.5);  // value in [0.5, 1.0)

        // Print "abc" approximately three times, "def" approximately twice,
        // and "ghi" approximately five times:
        for(int i = 0; i < 10; ++i)
            System.out.println(demo.getRandomSuffix());
    }
}

Notes:

  • Due to roundoff error, the throw new AssertionError() probably actually will happen every so often, albeit very rarely. So I recommend that you replace that line with something that just always chooses the first element or last element or something.
  • If the frequencies don't all add up to 1.0, then you should add a pass at the beginning of getRandomSuffix() that determines the sum of all frequencies. You can then scale value accordingly.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top