Detect whether the word you is a subject or object pronoun based on sentence context.
-
29-05-2021 - |
Frage
Ideally using regex, in python. I'm making a simple chatbot, and it's currently having problems responding to phrases like "I love you" correctly (it'll throw back "You love I" out of the grammar handler, when it should be giving back "You love me").
In addition, I'd like it if you could think of good phrases to throw into this grammar handler, that'd be great. I'd love some testing data.
If there's a good list of transitive verbs out there (something like a "top 100 used") it may be acceptable to use that and special case the "transitive verb + you" pattern.
Lösung
What you want is a syntactic analyser (aka parser)- this can be done by a rule-based system as described by @Dr.Kameleon, or statistically. There are many implementations out there, one being the Stanford one. These will generally tell you what the syntactic role of a word is (e.g. subject "You are here", or object "She like you"). How you use that information to turn statements into questions is a whole different can of worms. For English, you can get a fairly simple rule-based system to work OK.
Andere Tipps
Well, what you're trying to implement is definitely very challenging but also very difficult.
Logic
As a starter, I would look a bit into the Grammar rules first.
Basic sentence structure :
- SUBJECT + TRANSITIVE VERB + OBJECT
- SUBJECT + INTRANSITIVE VERB
(Of course, we could also talk about "Subject+Verb+Indirect Object+Direct Object" formats, etc (e.g. I give you the ball) but this would get too complicated for now...)
Obviously, this scheme is VERY simplistic, but let's stick to that for now.
Then (another over-simplistic assumption), that each part is a single word.
so basically you have the following Sentence Scheme :
WORD WORD WORD
which could be generally matched using a regex like :
([\w]+)\s+([\w]+)\s+([\w]+)?
Explanation :
([\w]+) # first word (=subject)
\s+ # one or more spaces
([\w]+) # second word (=verb)
\s+ # one or more spaces
([\w]+)? # (optional) third word (=object - if the verb is transitive)
Now, obviously to formulate sentences like "You love me" and not "You love I", your algorithm should also "understand" that :
- The third part of the sentence has the role of the Object
- Since "I" is a personal pronoun (used only in nominative case : "as a subject"), we should you its "accusative form" (=as an object); so, for this purpose, you may also need e.g. personal pronoun tables like :
- I - my - me
- You - your - you
- He - his - him
- etc...
Just a few ideas... (purely out of my enthusiasm for linguistics :-))
Data
As for the wordlists you are interested in, just a few samples :
- 330 Most Common English Verbs (most - if not all of them - are transitive)
- Personal Pronouns Chart