Pregunta

I'm writing a chat bot for a software engineering course in C#.

I'm using Markov chains to generate text, using Wikipedia articles as the corpus. I want it to respond to user input in an (at least slightly) intelligent way, based on their input, but I'm not sure how to do it.

My current thinking is that I'd try and extract keywords from the user's input, then use those to guide the sentence generation. But because of the Markov property, the keywords would have to be the first words in the sentence, which might look silly. As well, for an n order chain, I'd have to extract exactly n keywords from the user every time.

The data for the generator is a dictionary, where the keys are lists of words, and the values are lists of words combined with a weight depending on how often the word appears after the words in the key. So like:

{[word1, word2, ..., wordn]: [(word, weight), (word, weight), ...]}

It works in a command-line test program, but I'm just providing an n word seed for each bit of text it generates.

I'm hoping there's some way I can make the chain prefer words which are nearby words that the user used, rather than seeding it with the first/last n words in the input, or n keywords, or whatever. Is there a way to do that?

¿Fue útil?

Solución

One way to make your chat smarter is to identify the topic from the user's input. Assume you have your Markov brain conditioned on different topics as well. Then to construct your answer, you refer to the dictionary below:

{([word1, word2, ..., wordn], topic): [(word, weight), (word, weight), ...]}

To find the topics, you can start with WikipediaMiner. For instance, below are the topics and their corresponding weights found by wikify api against the sentence:

Statistics is so hard. Do you have some good tutorial of probability theory for a beginner?

[{'id': 23542, 'title': 'Probability theory', 'weight': 0.9257584778725553},
 {'id': 30746, 'title': 'Theory', 'weight': 0.7408577501980528},
 {'id': 22934, 'title': 'Probability', 'weight': 0.7089442931022307},
 {'id': 26685, 'title': 'Statistics', 'weight': 0.7024251356953044}]

Probably those identified keywords are also good to be treated as seeds. However, question answering is not so simple. This Markov-based sentence generation does not have the ability to understand the question at all. The best it can do is just providing related contents. Just my 2 cents.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top