How efficient is PHP's substr?
-
26-09-2019 - |
Question
I'm writing a parser in PHP which must be able to handle large in-memory strings, so this is a somewhat important issue. (ie, please don't "premature optimize" flame me, please)
How does the substr
function work? Does it make a second copy of the string data in memory, or does it reference the original? Should I worry about calling, for example, $str = substr($str, 1);
in a loop?
Solution
To further Chad's comment, your code would require both strings (the full one, and the full-one-minus-first-character) to be in memory at the same time (though not due to the assignment as Chad stated). See:
$string = str_repeat('x', 1048576);
printf("MEM: %d\nPEAK: %d\n", memory_get_usage(), memory_get_peak_usage());
substr($string, 1);
printf("MEM: %d\nPEAK: %d :-(\n", memory_get_usage(), memory_get_peak_usage());
$string = substr($string, 1);
printf("MEM: %d\nPEAK: %d :-(\n", memory_get_usage(), memory_get_peak_usage());
Outputs something like (memory values are in bytes):
MEM: 1093256
PEAK: 1093488
MEM: 1093280
PEAK: 2142116 :-(
MEM: 1093276
PEAK: 2142116 :-(
OTHER TIPS
If you're really looking into efficiency, you will need to keep a pointer - I mean index - with your string. Many string functions accept an offset to start operating from (like strpos()
's third parameter). Normally I would recommend writing an object to wrap this functionality, but if you're expecting to use that a lot, that might cause a performance bottleneck. Here is an example of what I mean (without OO):
while ($whatever) {
$pos = strpos($string, $myToken, $startIndex);
# do something using $pos
$startIndex = $pos;
}
If you want, you can write your own wrapper class that does these string operations and see if it has a speed impact:
class _String {
private $string;
private $startIndex;
private $length;
public function __construct($string) {
$this->string = $string;
$this->startIndex = 0;
$this->length = strlen($string);
}
public function substr($from, $length = NULL) {
$this->startIndex = $from;
if ($length !== NULL) {
$this->endIndex = $from + $length;
}
}
# other functions you might use
# ...
}
Yes, you should be careful doing any string manipulation inside a loop as new copies of the string will be generated on each iteration.