Question

I'm using PHPExcel to read through Excel spreadsheets of various sizes and then import the cell data into a database. Reading through the spreadsheet itself works great and is very quick, but I've noticed that the time to actually load/open the file for PHPExcel to use can take up to 10-20 seconds (the larger the file, the longer it takes--especially if the spreadsheet is >1MB in size).

This is the code I'm using to load the file before iterating through it:

$filetype = PHPExcel_IOFactory::identify($file);
$objReader = PHPExcel_IOFactory::createReader($filetype);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($file);

What can I do to get the file to load faster? It's frustrating that the greatest latency in importing the data is just in opening up the file initially.

Thank you!

Was it helpful?

Solution

I've seen this same behavior with Ruby and an Excel library: a non-trivial amount of time to open a large file, where large is > 500KB.

I think the cause is two things:

1) an xlsx file is zip compressed, so it must first be un-compressed

2) an xlsx file is a series of XML files, which all must be parsed.

#1 can be a small hit, but most likely it pales in comparison to #2. I believe its the XML parsing that is the real culprit. In addition, the XML parser is a DOM-based parser, so the whole XML DOM must be parsed and loaded into memory.

I don't think there is really anything you can do to speed this up. A large xlsx file contains a lot of XML which must be parsed and loaded into memory.

OTHER TIPS

Actually, there is something you can do. The problem with most of the XML parsers is that they first load the entire document in memory. For big documents, this takes a considerable amount of time.

A way to avoid this is to use parsers that allow streaming. So instead of loading all the XML files content in memory, you just load the part you need. That way, you can pretty much have only one row at a time in memory. This is super fast AND memory efficient.

If you are curious, you can find an example of a library using this technique here: https://github.com/box/spout

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top