Question

I have to divide text for separated sentences.

Ok. It seem be very simple.

Just search for the "." or "?" or "!" and add next sentence to the array.

But unfortunately is not so great and simple.

How can I avoid situation when:

Washington, D.C.

will be splitted for: "Washington, D" and "C".

OR

“One time we set off an explosive under the chair of our teacher, Mrs. Thurman."

Is splitted on:

"One time we set off an explosive under the chair of our teacher, Mrs"

And

"Thurman"

Maybe is the database with acronyms which contains "." ?

Thanks for tips in advance!

Était-ce utile?

La solution

Check out NLTK. It has out-of-the-box solutions for the problems you described

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top