문제

I'm still learning python and was wondering what a pythonic way (aka less is more) of coding the following problem.

Due to the wonders of 1990s technology, a text file is dumped to the server everyday with csv tables in it. That's right - one text file with two csv tables.

Goal: To parse out the csv tables and make them into two different csv files.

File looks like this:

start of file
blah blah
blah blah
blah blah

+--------
,tbl1, tbl1,
+--------
,data, data,
,data, data,
.....
,data,data
+--------
blah blah
blah blah
blah blah
+--------
,tbl2, tb2,
+--------
,data, data,
,data, data,
....
,data, data,
blah blah
blah blah

Issue: Table 1 is varying in length.

I need to be able to extract table 1 no matter what the length of the table is and make it a CSV file.

I have

def lp_to_csv(in_file_name, out_filename):
#open the input & output files.
inputfile = open(in_file_name, 'rb')
csv_file = out_filename
out_csvfile = open(csv_file, 'wb')

#read in the correct lines
my_text = inputfile.readlines()[117:-8]
del my_text[1]
for row in my_text:
    #cycle through to find the end
    if row[0] != ",":
        print "excludeded: " + row

#convert to csv using "," as delimiter
in_txt = csv.reader(my_text, delimiter = ',')
#hook csv writer to output file
out_csv = csv.writer(out_csvfile)
#write the data
out_csv.writerows(in_txt)

#close up
inputfile.close()
out_csvfile.close()

but that code only makes 1 csv file and it includes the second block of 'blah blah'.

I think I know how to make it two csv files (create a subroutine that creates a csv file with the my_text object, but how do I cut out the blah blah and how do i trigger when to chop it into 2 tables?

Can anyone direct me for a nice pythonic way?

ANSWERED: Very similar to below answer: I created a subroutine that created a list of where the delimiter +----- was. Since it's always in groups of three, First, Heading Row, End.

Then I chunk into groups of three the list.

Then activate the table making code on each group of three.

Pretty handy so far & stateless which is nice.

도움이 되었습니까?

해결책

No state variables, please.

delimiter = '+--------'
try:
    while True:
        skip_junk(src, delimiter)
        table = read_table_name(delimiter)
        process_table(table, src, delimiter)
except ...

and let any function to raise appropriate exception (EOF reached; table name missing, some other format violation, whatever else) to break the loop, and don't forget to process them. Normally, each function just returns when the delimiter is encountered.

다른 팁

I noticed the tables begin and end with +--------. You can use that to parse the file.

You should keep one state variable - inside_csv. At first initialize it to False. Now, go over the lines of the file, one by one. If you see +--------, flip inside_csv (from False to True or the other way around).

If you see another line, check inside_csv. If it's true, write the line to a CSV file. If it's not, ignore it.

Don't forget to switch CSV files when you finish the first and start the second.

Untested, but here's an approach

current_table = 0
headers_seen = 0
tables = [[], []]
for line in inputfile.readlines():
    if line == '+--------':
        headers_seen += 1
        if headers_seen == 2:
            current_table = 1
        elif headers_seen == 5:
            current_table = 2
        else:
            current_table = 0
        continue
    if not current_table:
        continue
    tables[current_table].append(line)

Feels like it could be cleaner, but I wanted to handle an arbitrary number of tables; then I realized there's that weird header issue, so it should headers_seen == [2, 5] would have to be smarter (something like = 2 or greater than 4 and headers_seen %2 == 1).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top