PHP Using str_word_count with strsplit to form array after x words

https://stackoverflow.com/questions/12176053

29-06-2021
|

Question

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.

I've read about str_word_count but can't work out how to put the two together.

What I've got at the moment is:

$outputArr = str_split($output, 250);

foreach($outputArr as $arOut){

echo $arOut;
echo "<br />";

}

But I want to substitute that to form each item of the array at 50 words instead of 250 characters.

Any help will be much appreciated.

Solution

Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:

$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);

You now have an array of arrays; to join every 50 words together and make it an array of strings you can use

foreach ($chunks as &$chunk) { // important: iterate by reference!
    $chunk = implode(' ', $chunk);
}

¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.

OTHER TIPS

There's another way:

<?php

$someBigString = <<<SAMPLE
  This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;

// change this to whatever you need to:     
$number_of_words = 7; 

$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i", 
  $someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);

$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;

Demo.

I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.

The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.

(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow