Question

I have an empty spreadsheet, but when I'm accessing it with Apache POI (version 3.10), it says it has 1024 columns and 20 physical columns.
I really deleted all the cells, only some formatting remains, but no content.
And if I delete some columns with LibreOffice Calc (version 4.1.3.2), the number of columns only increases! What's going on?
Is there a reliable way to get the real number of columns (or cells in a row)?

import java.net.URL;
import org.apache.poi.ss.usermodel.*;

public class Test {
    public static void main(final String... args) throws Exception {
        final URL url = new URL("http://aditsu.net/empty.xlsx");
        final Workbook w = WorkbookFactory.create(url.openStream());
        final Row r = w.getSheetAt(0).getRow(0);
        System.out.println(r.getLastCellNum());
        System.out.println(r.getPhysicalNumberOfCells());
    }
}
Was it helpful?

Solution

After some more investigation, I think I figured out what's happening.

First, some terminology from POI: there are some cells that don't actually exist at all in the spreadsheet - those are called missing, or undefined/not defined. Then there are some cells that are defined, but have no value - those are called blank cells. Both types of cells appear empty in a spreadsheet program and can't be distinguished visually.

My spreadsheet has some blank cells that LibreOffice added at the end of the row (possibly a bug). When I delete columns, LibreOffice seems to shift the subsequent cells (including the blank ones) to the left, and adds more blank cells at the end (up to 1024).

And now the key part: neither getLastCellNum() nor getPhysicalNumberOfCells() ignore blank cells. getLastCellNum() gives the last defined cell, and getPhysicalNumberOfCells() gives the number of defined cells, both including blank cells. There doesn't seem to be any method available that skips blank cells. The javadoc for getPhysicalNumberOfCells() is somewhat misleading - "if only columns 0,4,5 have values then there would be 3", but it's actually counting blank cells too, which don't really have values.

So the only solution I found is to loop through the cells and check if they are blank.

Side note: getLastRowNum() and getFirstCellNum() are 0-based but getLastCellNum() is 1-based, wtf?

OTHER TIPS

Most likely you have some kind of formatting applied for you row. I have an empty xlsx file created with excel and method getRow produces null for empty rows.

@aditsu as per https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/Row.html, getLastCellNum() gets the index of the last cell contained in this row PLUS ONE.

+1 for libreOffice strugle! it's a bug, and in my opinion is very random. I'm getting null randomly, and often helps if I delete EMPTY rows (bellow) and EMPTY columns (on the right side). ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top