Question

The current REGEX I'm using is the following one:

var sentences = fulltext.match(/[^\.!\?]+[\.!\?]+/g);

That returns an array with the sentences split INCLUDING the spaces (I need all the characters). Problem is, it does not work with ellipsis "..." and I guess neither it does with other unconventional forms of punctuation.

How can I fix my REGEX to match this and other forms of punctuation?

Is there any noob friendly example driven guide to REGEX out there?

Était-ce utile?

La solution

Unicode of ellipsis is \u2026.

So you can use \u2026 to match an ellipsis .

Code :

var fulltext= "First sentence… Second sentence. ";
fulltext.match(/([^.?!;\u2026]+[.?!;\u2026]+)/g);

OUTPUT

["First sentence…", " Second sentence."]

DEMO and Explanation

Autres conseils

You can just add the ellipsis (and any other punctuation characters) to your character sets.

var input = "First sentence… Second sentence. ";
input.match(/[^\.\?!;…]+[\.\?!;…]+/g);

Result:

["First sentence…", " Second sentence."]
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top