Frage

Ideally using regex, in python. I'm making a simple chatbot, and it's currently having problems responding to phrases like "I love you" correctly (it'll throw back "You love I" out of the grammar handler, when it should be giving back "You love me").

In addition, I'd like it if you could think of good phrases to throw into this grammar handler, that'd be great. I'd love some testing data.

If there's a good list of transitive verbs out there (something like a "top 100 used") it may be acceptable to use that and special case the "transitive verb + you" pattern.

War es hilfreich?

Lösung

What you want is a syntactic analyser (aka parser)- this can be done by a rule-based system as described by @Dr.Kameleon, or statistically. There are many implementations out there, one being the Stanford one. These will generally tell you what the syntactic role of a word is (e.g. subject "You are here", or object "She like you"). How you use that information to turn statements into questions is a whole different can of worms. For English, you can get a fairly simple rule-based system to work OK.

Andere Tipps

Well, what you're trying to implement is definitely very challenging but also very difficult.

Logic

As a starter, I would look a bit into the Grammar rules first.

Basic sentence structure :

  • SUBJECT + TRANSITIVE VERB + OBJECT
  • SUBJECT + INTRANSITIVE VERB

(Of course, we could also talk about "Subject+Verb+Indirect Object+Direct Object" formats, etc (e.g. I give you the ball) but this would get too complicated for now...)

Obviously, this scheme is VERY simplistic, but let's stick to that for now.

Then (another over-simplistic assumption), that each part is a single word.

so basically you have the following Sentence Scheme :

WORD WORD WORD

which could be generally matched using a regex like :

([\w]+)\s+([\w]+)\s+([\w]+)?

Explanation :

([\w]+)     # first word (=subject)
\s+         # one or more spaces    
([\w]+)     # second word (=verb)
\s+         # one or more spaces
([\w]+)?    # (optional) third word (=object - if the verb is transitive)

Now, obviously to formulate sentences like "You love me" and not "You love I", your algorithm should also "understand" that :

  • The third part of the sentence has the role of the Object
  • Since "I" is a personal pronoun (used only in nominative case : "as a subject"), we should you its "accusative form" (=as an object); so, for this purpose, you may also need e.g. personal pronoun tables like :
  • I - my - me
  • You - your - you
  • He - his - him
  • etc...

Just a few ideas... (purely out of my enthusiasm for linguistics :-))


Data

As for the wordlists you are interested in, just a few samples :

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top