Question

I am writing a converter code for our Data Department to convert fixed width files into delmited files. Normally we use import the file into Excel, use the text import wizard to set the field lengths, and then just save as a csv. However we have run into the limitation where we have started getting files that are millions of records long, and thus cant be imported into Excel. The files do not always have spaces in between the fields, espicially so between value fields like phone numbers or zip codes. The headers are also often filled completely in with no spaces.

A sample of a typical fixed width file we are dealing with:

SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900
000002T1  P1     Sample A Sample                       Director of Medicine                                              123 Any Street                          Po Box 1234                             Anytown 12345-6789                                12345678900

The program needs to break file into the following delimited fields:

Sequen
Sack and Pa
Full name
Job Title
Hosp Name
Delivery Address
Alternate Address 1
Calculated Text
POSTNET Bar

Each file as a slightly different width of each field depending on the rest of the job. What i am looking for is a GUI oriented delimiter much like the Excel import wizard for fixed width files. I am writing this tool in Python as a part of a larger tool that does many other file operations such as breaking up files into multiple up, reversing a file, converting from delimited to fixed width and check digit checking. I am using Tkinter for the rest of the tools and it would be ideal if the solution use it as well.

Any help appreciated

Was it helpful?

Solution

If I understand the problem correctly (and there's a good chance I don't...), the simplest solution might be to use a text widget.

Make the first line be a series of spaces the same length as the row. Use a couple of alternating tags (eg: "even" and "odd") to give each character an alternate color so they stand out from one another. The second line would be the header, and any remaining lines would be a couple lines of sample data.

Then, set up bindings on the first row to convert a space into an "x" when the user clicks on a character. If they click on an "x", convert it back to a space. They can then go and click on the character that is the start of each column. When the user is done, you can get the first line of the text widget and it will have an "x" for each column. You then just need a little function that translates that into whatever format you need.

It would look roughly like this (though obviously the colors would be different than what appears on this website)

      x          x                                     x  ...
SequenSack and PaFull Name****************************]JOB...
000001T1  P1     Sample A Sample                          ...

Here's a quick hack to illustrate the general idea. It's a little sloppy but I think it illustrates the technique. When you run it, click on an area in the first row to set or clear a marker. This will cause the header to be highlighted in alternate colors for each marker.

import sys
import Tkinter as tk
import tkFont

class SampleApp(tk.Tk):
    def __init__(self, *args, **kwargs):
        tk.Tk.__init__(self, *args, **kwargs)

        header = "SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar"
        sample = "000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900"
        widget = DelimiterWidget(self, header, sample)
        hsb = tk.Scrollbar(orient="horizontal", command=widget.xview)
        widget.configure(xscrollcommand=hsb.set)
        hsb.pack(side="bottom", fill="x")
        widget.pack(side="top", fill="x")

class DelimiterWidget(tk.Text):
    def __init__(self, parent, header, samplerow):
        fixedFont = tkFont.nametofont("TkFixedFont")
        tk.Text.__init__(self, parent, wrap="none", height=3, font=fixedFont)
        self.configure(cursor="left_ptr")
        self.tag_configure("header", background="gray")
        self.tag_configure("even", background="#ffffff")
        self.tag_configure("header_even", background="bisque")
        self.tag_configure("header_odd", background="lightblue")
        self.tag_configure("odd", background="#eeeeee")
        markers = " "*len(header)
        for i in range(len(header)):
            tag = "even" if i%2==0 else "odd"
            self.insert("end", " ", (tag,))
        self.insert("end", "\n")
        self.insert("end", header+"\n", "header")
        self.insert("end", samplerow, "sample")
        self.configure(state="disabled")
        self.bind("<1>", self.on_click)
        self.bind("<Double-1>", self.on_click)
        self.bind("<Triple-1>", self.on_click)

    def on_click(self, event):
        '''Handle a click on a marker'''
        index = self.index("@%s,%s" % (event.x, event.y))
        current = self.get(index)
        self.configure(state="normal")
        self.delete(index)
        (line, column) = index.split(".")
        tag = "even" if int(column)%2 == 0 else "odd"
        char = " " if current == "x" else "x"
        self.insert(index, char, tag)
        self.configure(state="disabled")
        self.highlight_header()
        return "break"

    def highlight_header(self):
        '''Highlight the header based on marker positions'''
        self.tag_remove("header_even", 1.0, "end")
        self.tag_remove("header_odd", 1.0, "end")
        markers = self.get(1.0, "1.0 lineend")

        i = 0
        start = "2.0"
        tag = "header_even"
        while True:
            try:
                i = markers.index("x", i+1)
                end = "2.%s" % i
                self.tag_add(tag, start, end)
                start = self.index(end)
                tag = "header_even" if tag == "header_odd" else "header_odd"
            except ValueError:
                break

if __name__ == "__main__":
    app = SampleApp()
    app.mainloop()

OTHER TIPS

edit: I now see that you are looking for a gui. I'll leave this incorrect answer for posterity.

import csv

def fixedwidth2csv(fw_name, csv_name, field_info, headings=None):
    with open(fw_name, 'r') as fw_in:
        with open(csv_name, 'rb') as csv_out: # 'rb' => 'r' for python 3
            wtr = csv.writer(csv_out)
            if headings:
                wtr.writerow(headings)
            for line in fw_in:
                wtr.writerow(line[pos:pos+width].strip() for pos, width in field_info)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top