Question

Hi I did a short course in AI and we designed a chatbot based on AIML and python. I have a new task to design some form of Semantic search engine. I want people to be able to navigate data or search for questions and I give them results. Initially it will be for specific topic e.g. transporation and geography. Some sample input from a user:

  1. How much will it cost for me to get from x to y?

    Ans: It will cost you 26$

  2. How far is x from z?

    Ans: It is 25 Miles

  3. A user can add facourite routes so they can simply type in, Add favourite roAnd the user will then be asked to enter the f routes.

    Ans: Are you asking to add an entry to your favourite routes?

    User:yes.

    Ans: Please enter a favourite route.

  4. Show my common routes.

    Ans: Your common routes are x,y and z.

So the data being searched may be specific to a user hence may have to use a database. Some data is external maybe envoke google maps to enquire on the distances. Some questions may simply require a response from a chatbot.

So what should i do upon user input? Tokenize it, stem it, parse it?

I was hoping to use AIML somewhere but an article i read http://knytetrypper.proboards.com/index.cgi?board=gbot&action=print&thread=285 . Says AIML is only good for pattern matching. Someone please point me in the correct direction. I downloaded NLTK, it seems usefull but i don't know if it on its own can do what I require.

Any similar projects articles?

Was it helpful?

Solution

This is a really hard problem. If you restrict the inputs to a very small space, it can be doable though. At that point though you are just using a vocabulary and have basic commands for each possible query.

There are several ways to discriminate between types of queries: 1) parse and try to use all that info 2) partial parse/pos tag- find verbs 3) machine learning/classification approach, using pos as feature, distances, words/constructions like 'to'/'from'

... and then you can try to pull out the query params once you've classified the query correctly.

I would avoid doing a parse until you are very sure what kind of query it is- a classification approach is the best first step, and for messing with that NLTK is very useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top