Question

This is probably an extremely difficult question to answer, but here is my question anyway.

I am wondering what the best method is for determining the topic of a conversation. The conversation takes place over IRC. I have written chatbots in the past that have interpreted the topic pretty well, but not as accurately as I would like.

In the past I have had to make lists of common words such as "the" and "a" then filter them from the topic array. I don't know if this is the correct way to do it though.

I am wondering if there is a frequency algorithm of some sort that will allow me to work out what word is the current topic of conversation.

Any suggestions as to how this can be achieved will be greatly appreciated. Thanks.

Was it helpful?

Solution

There is something called Zipf's Law. It could only accurately be applied to text written by a human, and it would have to have some length to it.

The result of running a text through such an algorithm would be a set of keywords (5%-7% of the original text) which closely describe the topic of the text.

OTHER TIPS

Natural Language Processing can be very difficult, but you can still get some results with just fundamentals. Daniel Gabriel's suggestion to use Zipf's Law is a good one.

An overview book like Manning and Schütze's Foundations of Statistical Natural Language Processing may be helpful - it explains many common techniques, and will point you towards more specialized resources.

(Their Introduction to Information Retrieval is excellent, too, and has a free PDF on the site.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top