check if one phrase relates to another using opennlp

https://stackoverflow.com//questions/22028635

21-12-2019
|

Question

I've created a vb.net application with a speech interface (SR and TTS), basically my own Jarvis.

I've decided to create a plug-in based app where the plug-ins will provide the "functionality" for the core app.

Each plugin will have a property called "Description" describing what the plugin can do, like one can be "can tell the current time/date" and another can be "manages calendar/reminders"

I want the core app to take the user input (converted to text) and compare it to the Description property of each plug-in and determine if that plug-in can "satisfy" the user input.

I know that the plug-in could hold a list of keywords that can trigger them but then I face the issue where the use of a keyword in the user's input was not intended to trigger a particular plug-in as in:

"what is the current time" where "time" is a recognized keyword for telling the time and is the user's intent but in:

"not this time" where again "time" is a recognized keyword for telling the time but is not the user's intent.

I currently have my sites set on using OpenNLP as I can integrate it easily into my project but I'm not sure what steps I must follow to achieve my goal.

Solution

Welcome to SO... Seems like you could use the OpenNLP doccat capability, in which you would create a model of your plugin descriptions, and then "classify" the input text against that model, which would return a probability distribution over your plugins. You could continuously add samples of user input over time and build a pretty nice model. You could even make it dynamic so if a user input receives a score above some thresh, you would then store that sample somewhere tagged with the right class label, and every so often rebuild the model.

Not sure is the .net version of OpenNLP will support doccat, so you could also take a more basic machine learning approach (using vectorization along with something like cosine similarity), or you might just be able to index the descriptions and use something like Lucene or solr or some database like MySQL/Postgres/MSSQL/Oracle to return a rank...

anyway... the basic steps for using OpenNLP are to

1. Build a doccat model using samples of user input and/or the plugin descriptions themselves. See opennlp docs for the format, but in short it is `class_label<space> <bag of words>\n` where each sample has a class label and ends with a new line.
2. instantiate a DocumentClassifier with the model, and classify(inputsampletext). the models can be big, so lazily instantiate them.
3. The classify method will give you a set of class labels with a Double as the "fit" score for each category. 
4. At this point I recommend capturing user input somewhere so you can continuously make the  models better.

These four steps are assuming just indexing the descriptions is not good enough for some reason. BTW I have never used the .net version of OpenNLP, but I have put OpenNLP java functionality behind web services and called them from .net...

HTH

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow