Question

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.

I've read about str_word_count but can't work out how to put the two together.

What I've got at the moment is:

$outputArr = str_split($output, 250);

foreach($outputArr as $arOut){

echo $arOut;
echo "<br />";

}

But I want to substitute that to form each item of the array at 50 words instead of 250 characters.

Any help will be much appreciated.

Was it helpful?

Solution

Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:

$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);

You now have an array of arrays; to join every 50 words together and make it an array of strings you can use

foreach ($chunks as &$chunk) { // important: iterate by reference!
    $chunk = implode(' ', $chunk);
}

¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.

OTHER TIPS

There's another way:

<?php

$someBigString = <<<SAMPLE
  This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;

// change this to whatever you need to:     
$number_of_words = 7; 

$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i", 
  $someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);

$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;

Demo.

I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.

The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.

(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top