Question

I'm still learning python and was wondering what a pythonic way (aka less is more) of coding the following problem.

Due to the wonders of 1990s technology, a text file is dumped to the server everyday with csv tables in it. That's right - one text file with two csv tables.

Goal: To parse out the csv tables and make them into two different csv files.

File looks like this:

start of file
blah blah
blah blah
blah blah

+--------
,tbl1, tbl1,
+--------
,data, data,
,data, data,
.....
,data,data
+--------
blah blah
blah blah
blah blah
+--------
,tbl2, tb2,
+--------
,data, data,
,data, data,
....
,data, data,
blah blah
blah blah

Issue: Table 1 is varying in length.

I need to be able to extract table 1 no matter what the length of the table is and make it a CSV file.

I have

def lp_to_csv(in_file_name, out_filename):
#open the input & output files.
inputfile = open(in_file_name, 'rb')
csv_file = out_filename
out_csvfile = open(csv_file, 'wb')

#read in the correct lines
my_text = inputfile.readlines()[117:-8]
del my_text[1]
for row in my_text:
    #cycle through to find the end
    if row[0] != ",":
        print "excludeded: " + row

#convert to csv using "," as delimiter
in_txt = csv.reader(my_text, delimiter = ',')
#hook csv writer to output file
out_csv = csv.writer(out_csvfile)
#write the data
out_csv.writerows(in_txt)

#close up
inputfile.close()
out_csvfile.close()

but that code only makes 1 csv file and it includes the second block of 'blah blah'.

I think I know how to make it two csv files (create a subroutine that creates a csv file with the my_text object, but how do I cut out the blah blah and how do i trigger when to chop it into 2 tables?

Can anyone direct me for a nice pythonic way?

ANSWERED: Very similar to below answer: I created a subroutine that created a list of where the delimiter +----- was. Since it's always in groups of three, First, Heading Row, End.

Then I chunk into groups of three the list.

Then activate the table making code on each group of three.

Pretty handy so far & stateless which is nice.

Was it helpful?

Solution

No state variables, please.

delimiter = '+--------'
try:
    while True:
        skip_junk(src, delimiter)
        table = read_table_name(delimiter)
        process_table(table, src, delimiter)
except ...

and let any function to raise appropriate exception (EOF reached; table name missing, some other format violation, whatever else) to break the loop, and don't forget to process them. Normally, each function just returns when the delimiter is encountered.

OTHER TIPS

I noticed the tables begin and end with +--------. You can use that to parse the file.

You should keep one state variable - inside_csv. At first initialize it to False. Now, go over the lines of the file, one by one. If you see +--------, flip inside_csv (from False to True or the other way around).

If you see another line, check inside_csv. If it's true, write the line to a CSV file. If it's not, ignore it.

Don't forget to switch CSV files when you finish the first and start the second.

Untested, but here's an approach

current_table = 0
headers_seen = 0
tables = [[], []]
for line in inputfile.readlines():
    if line == '+--------':
        headers_seen += 1
        if headers_seen == 2:
            current_table = 1
        elif headers_seen == 5:
            current_table = 2
        else:
            current_table = 0
        continue
    if not current_table:
        continue
    tables[current_table].append(line)

Feels like it could be cleaner, but I wanted to handle an arbitrary number of tables; then I realized there's that weird header issue, so it should headers_seen == [2, 5] would have to be smarter (something like = 2 or greater than 4 and headers_seen %2 == 1).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top