質問

I have to divide text for separated sentences.

Ok. It seem be very simple.

Just search for the "." or "?" or "!" and add next sentence to the array.

But unfortunately is not so great and simple.

How can I avoid situation when:

Washington, D.C.

will be splitted for: "Washington, D" and "C".

OR

“One time we set off an explosive under the chair of our teacher, Mrs. Thurman."

Is splitted on:

"One time we set off an explosive under the chair of our teacher, Mrs"

And

"Thurman"

Maybe is the database with acronyms which contains "." ?

Thanks for tips in advance!

役に立ちましたか?

解決

Check out NLTK. It has out-of-the-box solutions for the problems you described

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top