Question

I have a Python script using openpyxl to read an excel file. This used to work fine, until I discovered that openpyxl wasn't installed properly, which gave me errors running the script outside my IDE. After fixing this however, the script returns numeric values of which I don't understand where they come from, instead of the real values.

The script:

wb=load_workbook(r'C:\test.xlsx', use_iterators = True)
ws=wb.get_sheet_by_name('Sheet1')

#Iterate trough all rows
for row in ws.iter_rows(row_offset=1):
    for cell in row:
        #If the column == A, check if there's a website value
        if cell.column == 'A':
            try:
                print cell.internal_value
                self.match = re.match(regex, cell.internal_value)
                if self.match:
                    self.match = 'OK'
            except:
                pass

The print in the try block is added to see what is returned by the program, which is the following for the first five records:

0
1
31
49
143

It should be:

None
Website
www.coolblue.nl
www.bol.com
www.elektrosky.nl

Why does my script return these numeric values instead of the actual values?

EDIT: First 6 rows of my xml file (first row is empty)

Website           |     Sender    |     Price  |    Mark(s)       |     Payment methods
www.coolblue.nl         PostNL          Free      Thuiswinkel           Ideal, Visa, Mastercard
www.bol.com             PostNL          Free      Thuiswinkel           Ideal, Visa, Mastercard
www.elektrosky.nl       PostNL         € 5,00     Webshop keurmerk      Ideal, Visa, Mastercard, Amex, PayPal
www.belsimpel.nl        PostNL, DPD    € 6,95     Thuiswinkel           Ideal, Visa, Mastercard
Was it helpful?

Solution

The problem is you're using .internal_value. By default Excel stores strings in a lookup table and keeps the index in the cell. You should be fine if you just use .value

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top