You can literally translate your five bullet points to a regular expression:
!|\?|\.{3}|\.\D|\.\s
Note that I'm simply creating an alternation consisting of five alternatives, each of which represents one of your bullet points:
!
\?
\.{3}
\.\D
\.\s
Since the dot (.
) and the question mark (?
) are special characters within a regular expression pattern, they need to be escaped by a backslash (\
) to be treated as literals. The pipe (|
) is the delimiting character between two alternatives.
Using the above regular expression, you can then split your text into sentences using re.split
.