Question

I'm writing a parser in PHP which must be able to handle large in-memory strings, so this is a somewhat important issue. (ie, please don't "premature optimize" flame me, please)

How does the substr function work? Does it make a second copy of the string data in memory, or does it reference the original? Should I worry about calling, for example, $str = substr($str, 1); in a loop?

Was it helpful?

Solution

To further Chad's comment, your code would require both strings (the full one, and the full-one-minus-first-character) to be in memory at the same time (though not due to the assignment as Chad stated). See:

$string = str_repeat('x', 1048576);
printf("MEM:  %d\nPEAK: %d\n", memory_get_usage(), memory_get_peak_usage());

substr($string, 1);
printf("MEM:  %d\nPEAK: %d  :-(\n", memory_get_usage(), memory_get_peak_usage());

$string = substr($string, 1);
printf("MEM:  %d\nPEAK: %d  :-(\n", memory_get_usage(), memory_get_peak_usage());

Outputs something like (memory values are in bytes):

MEM:  1093256
PEAK: 1093488
MEM:  1093280
PEAK: 2142116  :-(
MEM:  1093276
PEAK: 2142116  :-(

OTHER TIPS

If you're really looking into efficiency, you will need to keep a pointer - I mean index - with your string. Many string functions accept an offset to start operating from (like strpos()'s third parameter). Normally I would recommend writing an object to wrap this functionality, but if you're expecting to use that a lot, that might cause a performance bottleneck. Here is an example of what I mean (without OO):

while ($whatever) {
    $pos = strpos($string, $myToken, $startIndex);
    # do something using $pos
    $startIndex = $pos;
}

If you want, you can write your own wrapper class that does these string operations and see if it has a speed impact:

class _String {
    private $string;
    private $startIndex;
    private $length;
    public function __construct($string) {
        $this->string = $string;
        $this->startIndex = 0;
        $this->length = strlen($string);
    }
    public function substr($from, $length = NULL) {
        $this->startIndex = $from;
        if ($length !== NULL) {
            $this->endIndex = $from + $length;
        }
    }
    # other functions you might use
    # ...
}

Yes, you should be careful doing any string manipulation inside a loop as new copies of the string will be generated on each iteration.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top