Question

Apologies if this is a basic question, but let us say I have a tab delimited file named file.txt formatted as follows:

Label-A    [tab]    Value-1

Label-B    [tab]    Value-2

Label-C    [tab]    Value-3

[...]

Label-i    [tab]    Value-n

I want xlrd or openpyxl to add this data to the excel worksheet named Worksheet in the file workbook.xlsx such that the cells contain the following values. I do not want to affect the contents of any other part of workbook.xlsx other than the two columns that are affected

A1=Label-A

B1=Value-1

A2=Label-B

B2=Value-2

[etc.]

EDIT: Solution

import sys
import csv
import openpyxl

tab_file = sys.stdin.readlines()

reader = csv.reader(tab_file, delimiter='\t')
first_row = next(reader)
num_cols = len(first_row)

try:
    workbook = sys.argv[1]
    write_sheet = sys.argv[2]
except Exception:
    raise sys.exit("ERROR")

try:   
    first_col = int(sys.argv[3])
except Exception:
    first_col = 0

tab_reader = csv.reader(tab_file, delimiter='\t')
xls_book = openpyxl.load_workbook(filename=workbook)
sheet_names = xls_book.get_sheet_names()
xls_sheet = xls_book.get_sheet_by_name(write_sheet)
for row_index, row in enumerate(tab_reader):
    number = 0
    col_number = first_col
    while number < num_cols:
        cell_tmp = xls_sheet.cell(row = row_index, column = col_number)
        cell_tmp.value = row[number]
        number += 1
        col_number += 1
xls_book.save(workbook)
Was it helpful?

Solution

Since you said you are used to working in Bash, I'm assuming you're using some kind of Unix/Linux, so here's something that will work on Linux.

Before pasting the code, I'd like to point a few things:

Working with Excel in Unix (and Python) is not that straightforward. For instance, you can't open an Excel sheet for reading and writing at the same time (at least, not as far as I know, although I must recognize that I have never worked with the openpyxl module). Python has two well known modules (that I am used to working with :-D ) when it comes to handling Excel sheets: One is for reading Excel sheets (xlrd) and the second one for writing them (xlwt) With those two modules, if you want to modify an existing sheet, as I understand you want to do, you need to read the existing sheet, copying it to a writable sheet and edit that one. Check the question/answers in this other S.O. question that explain it with some more detail.

Reading whatever-separated files is much easier thanks to the csv module (its prepared for comma-separated files, but it can be easily tweaked for other separators). Check it out.

Also, I wasn't very sure from your example if the contents of the tab-separated file indicate somehow the row indexes on the Excel sheet or they're purely positional. When you say that in the tab-separated file you have Value-2, I wasn't sure if that 2 meant the second row on the Excel file or it was just an example of some text. I assumed the latest (which is easier to deal with), so whatever pair Label Value appears on the first row of your tab-separated file will be the first pair on the first row of the Excel file. It this is not the case, leave a comment a we will deal with it ;-)

Ok, so let's assume the following scenario:

You have a tab-separated file like this:

stack37.txt:

Label-A Value-1
Label-B Value-2
Label-C Value-3

The excel file you want to modify is stack37.xls. It only has one sheet (or better said, the sheet you want to modify is the first one in the file) and it initially looks like this (in LibreOffice Calc):

enter image description here

Now, this is the python code (I stored it in a file called stack37.py and it's located in the same directory of the tab-separated file and the excel file):

import csv
import xlwt
import xlrd
from xlutils import copy as xl_copy

with open('stack37.txt') as tab_file:
    tab_reader = csv.reader(tab_file, delimiter='\t')
    xls_readable_book = xlrd.open_workbook('stack37.xls')
    xls_writeable_book = xl_copy.copy(xls_readable_book)
    xls_writeable_sheet = xls_writeable_book.get_sheet(0)
    for row_index, row in enumerate(tab_reader):
        xls_writeable_sheet.write(row_index, 0, row[0])
        xls_writeable_sheet.write(row_index, 1, row[1])
    xls_writeable_book.save('stack37.xls')

After you run this code, the file stack37.xls will look like this:

enter image description here

What I meant about not knowing what you exactly wanted to do with the values in your tab-separated file is that regardless of what you name your items in there, it will modify the first row of the excel sheet, then the second... (even if your first Value is called Value-2, the code above will not put that value on the second row of the Excel sheet, but on the fist row) It just assumes the first line in the tab-separated file corresponds with the values to set on the first row of the Excel sheet.

Let explain with an slightly modified example:

Let's assume your original Excel file looks like the original excel file on my screenshot (the full of | Hello-Ax | Bye-Bx |) but your tab-separated file now looks like this:

stack37.txt:

foo bar
baz baz2

After you run stack37.py, this is how your Excel will look like:

enter image description here

(see? first row of the tab-separated file goes to the first row in the Excel file)

UPDATE 1:

I'm trying the openpyxl module myself... Theoretically (according to the documentation) the following should work (note that I've changed the extensions to Excel 2007/2010 .xlsx):

import csv
import openpyxl

with open('stack37.txt') as tab_file:
    tab_reader = csv.reader(tab_file, delimiter='\t')
    xls_book = openpyxl.load_workbook(filename='stack37.xlsx')
    sheet_names = xls_book.get_sheet_names()
    xls_sheet = xls_book.get_sheet_by_name(sheet_names[0])
    for row_index, row in enumerate(tab_reader):
        cell_tmp1 = xls_sheet.cell(row = row_index, column = 0)
        cell_tmp1.value = row[0]
        cell_tmp2 = xls_sheet.cell(row = row_index, column = 1)
        cell_tmp2.value = row[1]
    xls_book.save('stack37_new.xlsx')

But if I do that, my LibreOffice refuses to open the newly generated file stack37_new.xlsx (maybe is because my LibreOffice is old? I'm in a Ubuntu 12.04, LibreOffice version 3.5.7.2... who knows, maybe is just that)

OTHER TIPS

That's a job for VBA, but if I had to do it in Python I would do something like this:

import Excel
xl = Excel.ExcelApp(False)
wb = xl.app.Workbooks("MyWorkBook.xlsx")
wb.Sheets("Ass'y").Cells(1, 1).Value2 = "something"
wb.Save()

With an helper Excel.py class like this:

import win32com.client

class ExcelApp(object):
    def __init__(self, createNewInstance, visible = False):
        self._createNewInstance=createNewInstance

        if createNewInstance:
            self.app = win32com.client.Dispatch('Excel.Application')
            if visible:
                self.app.Visible = True
        else:
            self.app = win32com.client.GetActiveObject("Excel.Application")

    def __exit__(self):
        if self.app and self._createNewInstance:
            self.app.Quit()

    def __del__(self):
        if self.app and self._createNewInstance:
            self.app.Quit()

    def quit(self):
        if self.app:
            self.app.Quit()

You should use the CSV module in the standard library to read the file.

In openpyxl you can have something like this:

from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb[sheetname]
for idx, line in enumerate(csvfile):
    ws.cell(row=idx, column=0) = line[0]
    ws.cell(row=idx, column=1) = line[1]
wb.save("changed.xlsx")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top