PHPxcel toArray function returns unnecessary rows

https://stackoverflow.com/questions/22685620

phpexcel

22-06-2023
|

Question

I am using PHPExcel to read in and parse excel files. I found this to be a good library, it is just a shame that the memory footprint is so high. (Here's hoping moving over to PHP 5.3 will enable further optimizations). My question is regarding the toArray() method from Worksheet.php that one uses to get an array of file data. I loop through the file line numbers in the recommended way (using a read filter), but found that the returned array is "filled up" with null values up to the point where the read filter's range starts.

For example, let say the read filter's row range is set to start at 3 and read 2 rows, I get

   0,    1,    2
1: null, null, null
2: null, null, null
3: "1",  "2",  "3"
4: "a",  "b",  "c"

Will this not affect the memory footprint when reading in large files? Is there a way to prevent this behavior? (I can't seem to find any references). Using array_slice on the result should be pointless as it will simply create a new smaller array, the original array will still create the memory spike.

Solution

The toArray() method /always/ takes the values from cell 'A1' to the highest column/row in the worksheet.... irrespective of whether cells are populated or not; and yes, that will affect memory footprint; that's why we also provide a rangeToArray() method where you can specify the actual range, e.g.

$myArray = $objPHPExcel->getActiveSheet()->rangeToArray('A3:C4');

When you apply a read filter to only read cells A3 to C4, PHPExcel still populates cell A3 with the value of cell A3 from the original file, it simply doesn't populate data in cells outside of the read filter.... this means that those cells outside the specified range aren't loaded, so saving on memory. But if you try to access a cell outside that range from within your script (and toArray() does reference cells A1, B1, C1, A2, B2, C2 in your case) then empty cells will be created in memory, leading to this memory spike.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow