Can't load xlsx file

Question 1

The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.

Question 2

It appears that your .xlsx file is damaged or permanently corrupted. The reasons could be many. One of them could be that you might have renamed the extension of the file to .xlsx which would invalidate the file. To confirm this beahviour, please try to open this file in Microsoft Excel.

I tried reading the file through, openpyxl, xlrd and pandas but none of them worked.

>>> import xlrd
>>> xlrd.open_workbook('test.xlsx')
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html> <'


>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
InvalidFileException: File is not a zip file

>>> import pandas 
>>> pandas.ExcelFile('test.xlsx')
InvalidFileException: File is not a zip file

Question 3

I ran into this issue trying to open every file in a directory ending in *.xlsx . I later found the file that caused the error was named ~$filename.xlsx . I'm guessing that Microsoft indicates that a file is currently opened by creating a file with the same name, prepended with the ~$. Once I closed the file, everything worked as expected.

Question 4

I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community

Question 5

OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.

Question 6

I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great.

see the examples and documentation at http://www.python-excel.org/