Question

I'm using xlrdto read a .xlsx file and save them to a .csv file. Everything is ok, the problem is that all the int values of the .xlsx file are converted to float automatically on the .csv file. This means that if I've a 40 inside of a cell of the .xlsxfile, it appears as 40.0on the .csv file.

I use the following code to read and convert it to .csv.

    wb = xlrd.open_workbook('share\docs\excelcontrol2.xlsx')
    sh = wb.sheet_by_name('Hoja1')
    archivo_csv = open('share\docs\output.csv', 'wb')
    wr = csv.writer(archivo_csv, delimiter=";")
    for rownum in xrange(sh.nrows):
        wr.writerow(sh.row_values(rownum))
    archivo_csv.close()

The .xlsx files contains int and float among other stuff. How can I save the .csv file to keep the original format? I mean, whitout changing the int to float and leave the rest as it is?

Thanks in advance.

Was it helpful?

Solution

According to xlrd docs the Excel XL_CELL_NUMBER will be converted to Python float type.

I think this is the reason that your int values are converted to floats.

OTHER TIPS

If there are specific columns which have only int values instead of floats, you'll have to convert those columns to that type before saving as CSV. The same is true for dates, since those are also stored as floats.

First open your xlsx file with Excel or LibreOfficeCalc and format the cells containing numbers:

  • if the number is an int, set 0 decimals,
  • if the number is a float, set at least 1 decimal.

My solution uses openpyxl library. In this library, each object Cell has the attribute format, which corresponds to the number of decimals you have setted just before. Reading this attribute, we will be able to discriminate int from float.

Here is the code:

from openpyxl import load_workbook

def csv_from_excel(xlsx_file_path):
    """

    :param xlsx_file_path: String. Path of the excel file.

    Example :
    while calling csv_from_excel("one/two/my_file.xlsx"), the file "one/two/my_file.csv" is created.

    """

    file_name, extension = os.path.splitext(xlsx_file_path)
    csv_file_path = file_name + ".csv"
    wb = load_workbook(filename=xlsx_file_path)
    first_sheet = wb.get_sheet_names()[0]
    worksheet = wb.get_sheet_by_name(first_sheet)
    content = []
    for row in worksheet.iter_rows():
        my_row = []
        for cell in row:
            value = cell.internal_value
            the_format = cell.number_format
            if value_is_float_in_int_format(value, the_format):  # case when excel will gives 80 instead of 80.0
                value = float(value)
            my_row.append(value)
        content.append(my_row)
    write_csv_file(csv_file_path, content)


def value_is_float_in_int_format(value, the_format):
    result = isinstance(value, int)
    result = result and not (the_format == "General" or the_format == "0")
    return result


def write_csv_file(csv_file_path, content, delimiter=CSV_DEFAULT_DELIMITER):
    """

    :param csv_file_path: String. Path of the csv file to write on.
    :param delimiter: Char. Delimiter for the csv file (can be ';' ',' or '\t' for tab)
    :param content: List of List of String. Content to write in list of list.

    """
    logger.debug("FILE I/O : writing content in the file %s ", csv_file_path)
    with open(csv_file_path, "w") as a_file:
        writer = csv.writer(a_file, lineterminator='\n', delimiter=delimiter)
        writer.writerows(content)


my_xlsx_file = "/home/session/Documents/my_file.xlsx"
csv_from_excel(my_xlsx_file)  # this creates the csv file
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top