Sentences division
Question
I have to divide text for separated sentences.
Ok. It seem be very simple.
Just search for the "." or "?" or "!" and add next sentence to the array.
But unfortunately is not so great and simple.
How can I avoid situation when:
Washington, D.C.
will be splitted for: "Washington, D" and "C".
OR
“One time we set off an explosive under the chair of our teacher, Mrs. Thurman."
Is splitted on:
"One time we set off an explosive under the chair of our teacher, Mrs"
And
"Thurman"
Maybe is the database with acronyms which contains "." ?
Thanks for tips in advance!
La solution
Check out NLTK. It has out-of-the-box solutions for the problems you described
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow