Question

I have a string like this Hello? My name is Ben! @ My age is 32.

I want to change it into an array with all words, spaces and punctuation as separate entities in the array. For example if I did var_dump($sentence) the array should look like this:

array(12) {
  [0]=>
  string(5) "Hello"
  [1]=>
  string(1) "?"
  [2]=>
  string(1) " "
  [3]=>
  string(2) "My"  
  [4]=>
  string(1) " "
  [5]=>
  string(4) "name"
  [6]=>
  string(1) " "  
  [7]=>
  string(2) "is"
  [8]=>   
  string(1) " "  
  [9]=>
  string(3) "Ben"
  [10]=>
  string(1) "!" 
  [11]=>
  string(1) " " 
  [12]=>
  string(1) "@" 

etc...

The only code I've found which comes close to this is:

$sentence = preg_split("/(?<=\w)\b\s*/", 'Hello? My name is Ben! @ My age is 32.');

echo '<pre>';
var_dump($sentence);
echo '</pre>';

which outputs:

array(10) {
[0]=>
string(5) "Hello"
[1]=>
string(4) ". My"
[2]=>
string(4) "name"
[3]=>
string(2) "is"
[4]=>
string(3) "Ben"
[5]=>
string(6) "! @ My"
[6]=>
string(3) "age"
[7]=>
string(2) "is"
[8]=>
string(2) "32"
[9]=>
string(1) "."
}

How do i change this so the spaces and punctuation are separated in the array?

Was it helpful?

Solution

No need for lookahead: just make preg_split capture delimiters as well (with PREG_SPLIT_DELIM_CAPTURE option):

$str = 'Hello? My name is Ben! @ My age is 32.';
$arr = preg_split('/(\W)/', $str, null, 
  PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

Demo. With this setup, each \W (non-word) symbol is captured separately (as a delimiter), but all \w symbols are gathered into sequences (as parts of the string separated by \W).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top