سؤال

I have to divide text for separated sentences.

Ok. It seem be very simple.

Just search for the "." or "?" or "!" and add next sentence to the array.

But unfortunately is not so great and simple.

How can I avoid situation when:

Washington, D.C.

will be splitted for: "Washington, D" and "C".

OR

“One time we set off an explosive under the chair of our teacher, Mrs. Thurman."

Is splitted on:

"One time we set off an explosive under the chair of our teacher, Mrs"

And

"Thurman"

Maybe is the database with acronyms which contains "." ?

Thanks for tips in advance!

هل كانت مفيدة؟

المحلول

Check out NLTK. It has out-of-the-box solutions for the problems you described

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top