Apache POI Read xlsx NPE

Question

There doesn't have to be a Row object for all rows. Think about it, when you start a new spreadsheet in Excel, you can have up to 1,048,576 rows, yet saving an empty spreadsheet results in a file size that is small. That is, references to rows that don't exist would result in an absolutely huge file. References to rows should only be stored if there is some kind of content associated with them -- any of cell values, formatting, borders, etc. A row may appear blank but have some formatting or maybe it used to have content that is now gone. There is a similar argument for Cells in a row. There's no reason to have a Cell reference for cells that aren't even used. But you can remove the content of a Cell and not have the Cell itself be removed; it can be a CELL_TYPE_BLANK cell.

If it never existed, then it will be null. Even if it has no content, it may have formatting that needs to be represented, so it won't be null. If it used to have content or formatting, then it won't be null unless someone explicitly deletes it, either in Excel with Right Click -> Delete or in POI with removeCell or removeRow.

If the row doesn't have any content, then it makes sense that it could be null. As you have mentioned, you can always check the Row returned by getRow if it's null before accessing it, and you can always check the Cell returned by getCell if it's null before accessing it. You can also supply a Row.MissingCellPolicy to getCell to control that method's behavior. CREATE_NULL_AS_BLANK will create the Cell for you if it didn't already exist. Imagine having 16,384 Cells for a Row, where usually only a few at most are needed.

(There are other missing cell policies. RETURN_BLANK_AS_NULL does the opposite; if it exists but is blank, then null will be returned. The default, RETURN_NULL_AND_BLANK, just returns whatever is there without any other action.)