There doesn't have to be a Row
object for all rows. Think about it, when you start a new spreadsheet in Excel, you can have up to 1,048,576 rows, yet saving an empty spreadsheet results in a file size that is small. That is, references to rows that don't exist would result in an absolutely huge file. References to rows should only be stored if there is some kind of content associated with them -- any of cell values, formatting, borders, etc. A row may appear blank but have some formatting or maybe it used to have content that is now gone. There is a similar argument for Cell
s in a row. There's no reason to have a Cell
reference for cells that aren't even used. But you can remove the content of a Cell
and not have the Cell
itself be removed; it can be a CELL_TYPE_BLANK
cell.
If it never existed, then it will be null
. Even if it has no content, it may have formatting that needs to be represented, so it won't be null
. If it used to have content or formatting, then it won't be null
unless someone explicitly deletes it, either in Excel with Right Click -> Delete or in POI with removeCell
or removeRow
.
If the row doesn't have any content, then it makes sense that it could be null
. As you have mentioned, you can always check the Row
returned by getRow
if it's null
before accessing it, and you can always check the Cell
returned by getCell
if it's null
before accessing it. You can also supply a Row.MissingCellPolicy
to getCell
to control that method's behavior. CREATE_NULL_AS_BLANK
will create the Cell
for you if it didn't already exist. Imagine having 16,384 Cell
s for a Row
, where usually only a few at most are needed.
(There are other missing cell policies. RETURN_BLANK_AS_NULL
does the opposite; if it exists but is blank, then null
will be returned. The default, RETURN_NULL_AND_BLANK
, just returns whatever is there without any other action.)