According to xlrd docs the Excel XL_CELL_NUMBER
will be converted to Python float type.
I think this is the reason that your int values are converted to floats.
Question
I'm using xlrd
to read a .xlsx
file and save them to a .csv
file. Everything is ok, the problem is that all the int
values of the .xlsx
file are converted to float
automatically on the .csv
file. This means that if I've a 40
inside of a cell of the .xlsx
file, it appears as 40.0
on the .csv
file.
I use the following code to read and convert it to .csv
.
wb = xlrd.open_workbook('share\docs\excelcontrol2.xlsx')
sh = wb.sheet_by_name('Hoja1')
archivo_csv = open('share\docs\output.csv', 'wb')
wr = csv.writer(archivo_csv, delimiter=";")
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
archivo_csv.close()
The .xlsx
files contains int
and float
among other stuff. How can I save the .csv
file to keep the original format? I mean, whitout changing the int
to float
and leave the rest as it is?
Thanks in advance.
Solution
According to xlrd docs the Excel XL_CELL_NUMBER
will be converted to Python float type.
I think this is the reason that your int values are converted to floats.
OTHER TIPS
If there are specific columns which have only int values instead of floats, you'll have to convert those columns to that type before saving as CSV. The same is true for dates, since those are also stored as floats.
First open your xlsx file with Excel or LibreOfficeCalc and format the cells containing numbers:
My solution uses openpyxl library. In this library, each object Cell has the attribute format, which corresponds to the number of decimals you have setted just before. Reading this attribute, we will be able to discriminate int from float.
Here is the code:
from openpyxl import load_workbook
def csv_from_excel(xlsx_file_path):
"""
:param xlsx_file_path: String. Path of the excel file.
Example :
while calling csv_from_excel("one/two/my_file.xlsx"), the file "one/two/my_file.csv" is created.
"""
file_name, extension = os.path.splitext(xlsx_file_path)
csv_file_path = file_name + ".csv"
wb = load_workbook(filename=xlsx_file_path)
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)
content = []
for row in worksheet.iter_rows():
my_row = []
for cell in row:
value = cell.internal_value
the_format = cell.number_format
if value_is_float_in_int_format(value, the_format): # case when excel will gives 80 instead of 80.0
value = float(value)
my_row.append(value)
content.append(my_row)
write_csv_file(csv_file_path, content)
def value_is_float_in_int_format(value, the_format):
result = isinstance(value, int)
result = result and not (the_format == "General" or the_format == "0")
return result
def write_csv_file(csv_file_path, content, delimiter=CSV_DEFAULT_DELIMITER):
"""
:param csv_file_path: String. Path of the csv file to write on.
:param delimiter: Char. Delimiter for the csv file (can be ';' ',' or '\t' for tab)
:param content: List of List of String. Content to write in list of list.
"""
logger.debug("FILE I/O : writing content in the file %s ", csv_file_path)
with open(csv_file_path, "w") as a_file:
writer = csv.writer(a_file, lineterminator='\n', delimiter=delimiter)
writer.writerows(content)
my_xlsx_file = "/home/session/Documents/my_file.xlsx"
csv_from_excel(my_xlsx_file) # this creates the csv file