Question

When i am trying to extract the data from an xlsx files. I get the encoding details with the data as well.

Consider the code as shown below,

column_number = 0
column_headers = []
#column_headers = sheet.row_values(row_number)
while column_number <= sheet.ncols - 1:
    column_headers.append(sheet.cell(row_number, column_number).value)
    column_number+=1

return column_headers

output is,

[u'Rec#', u'Cyc#', u'Step', u'TestTime', u'StepTime', u'Amp-hr', u'Watt-hr', u'Amps', u'Volts', u'State', u'ES', u'DPt Time', u'ACR', u'DCIR']

I just want to extract the cell value which is the data without "u'" attached to it . How can i get just that ?

Was it helpful?

Solution

You can use string encoding to convert the unicode to ascii. So your updated code should be

column_headers.append((sheet.cell(row_number, column_number).value).encode('ascii','ignore'))

You can get the value by using data.value for the content of the field name. Also note that integers are imported as floats by default. So, you may end with with an additional .0 in the end, which you can remove by typecasting the value by using int(data.value).

OTHER TIPS

Have you tried the following:

print data.value

In the new code could you try this:

import unicodedata
...
output = []
for cell in column_headers:
    output.append(unicodedata.normalize('NFKD', cell))
return output

Please see this for more info: https://stackoverflow.com/a/1207479/2168278

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top