문제

Python beginner here. I am using the matplotlib library to make graphs from tab delimited text files. I want my script to be flexible, so that it can take different types of data file and turn them into graphs. The key issue I have is that different text files have different numbers of header lines before the data begins. I would like to have a way for Python to figure out how many lines the header is, then remove them.

I think this could be done in two ways:

1) Count the most frequently occuring number of columns/elements per line in file, as most lines will be columns containing the data of interest. Then with a for loop, remove all lines that do not contain this number of columns.

2) Count number of columns/elements in last row of file, then remove any rows not matching this length. As in all the files I use the last row is also the data, this would work too.

If anyone can show me a short way to do this in python to integrate into my script that would really help a lot.

Many thanks,

Rubal

도움이 되었습니까?

해결책

1)

# lines = lines parsed out of file
line_store = {}
for line in lines:
    tokens = line.split('\t')
    if len(tokens) in line_store:
        line_store[len(tokens)].append(line)
    else:
        line_store[len(tokens)] = [line]
most = []
for line_group in line_store.values():
    if len(line_group) > len(most):
        most = line_group

most will end up being the list you want

2)

# lines = lines parsed out of file
tokens_in_last_line = len(lines[-1].split('\t'))
lines_with_correct_number_of_tokens = []
for line in lines[:-2]:
    if len(line.split('\t')) == tokens_in_last_line
        lines_with_correct_number_of_tokens.append(line)
lines_with_correct_number_of_tokens.append(lines[-1])

`lines_with_correct_number_of_tokens' will have all your lines with the same number of tokens as the last line in the file.

Both these solutions have major flaws though. (1) will choke if you have a header with the same number of tokens as the content rows, or if the header rows outnumber the content rows. (2) will choke if you have a footer line, or if the last line is blank, or, again, if your header rows have the same number of tokens as your content rows. I think you should see if you could come up with a more elegant solution.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top