Unicode of ellipsis is \u2026
.
So you can use \u2026
to match an ellipsis .
Code :
var fulltext= "First sentence… Second sentence. ";
fulltext.match(/([^.?!;\u2026]+[.?!;\u2026]+)/g);
OUTPUT
["First sentence…", " Second sentence."]
Question
The current REGEX I'm using is the following one:
var sentences = fulltext.match(/[^\.!\?]+[\.!\?]+/g);
That returns an array with the sentences split INCLUDING the spaces (I need all the characters). Problem is, it does not work with ellipsis "..." and I guess neither it does with other unconventional forms of punctuation.
How can I fix my REGEX to match this and other forms of punctuation?
Is there any noob friendly example driven guide to REGEX out there?
Solution
Unicode of ellipsis is \u2026
.
So you can use \u2026
to match an ellipsis .
Code :
var fulltext= "First sentence… Second sentence. ";
fulltext.match(/([^.?!;\u2026]+[.?!;\u2026]+)/g);
OUTPUT
["First sentence…", " Second sentence."]
OTHER TIPS
You can just add the ellipsis (and any other punctuation characters) to your character sets.
var input = "First sentence… Second sentence. ";
input.match(/[^\.\?!;…]+[\.\?!;…]+/g);
Result:
["First sentence…", " Second sentence."]