Take column average of one file, repeat for all .txt files, write all averages to one file in python

https://stackoverflow.com/questions/22711880

23-06-2023
|

Question

I have approximately 40 *.txt files, which all contain two columns of data - they have been converted to str format from a previous script and are space delimited. I would like to take the average of the second column for each .txt file, and then put all the averages into one output file. They also need to be read in numerical order, eg file1.txt, file2.txt.

I have the following scripts at the moment a) to read in the last line of all .txt files and b) to take the average of one file, but when I try to combine them I either get an error saying that it cannot convert the string to a float or that the list index is out of range. I've also tried to do the line.strip() method to confirm that there are no blank lines in the .txt file to sort out the latter problem, to no avail.

a) code that reads in the last line of all .txt files:

import sys
import os
import re
import glob

numbers = re.compile(r'(\d+)')
def numericalSort(value):
    parts = numbers.split(value)
    parts[1::2] = map(int, parts[1::2])
    return parts

list_of_files = glob.glob("./*.txt")

for file in sorted(list_of_files, key=numericalSort):
    infiles =  open(file, "r")

    outfile = open("t-range","a")

    contents = [""]
    columns = []

   counter = 0

   for line in infiles:
        counter += 1
        contents.append(line)

    for line in contents:
        if line.startswith("20000"):
            columns.append(float(line.split()[1]))
            print columns

    counter1 = 0
    for line in columns:
        counter1 += 1
      outfile.write(','.join(map(str,columns))+"\n")

infiles.close()
outfile.close()

b) script that takes average value of one file:

data = open("file.txt","r").read().split()
s = sum ([float(i) for i in data])
average = s / len(data)

c) combined script

import sys
import os
import re
import glob

numbers = re.compile(r'(\d+)')
def numericalSort(value):
    parts = numbers.split(value)
    parts[1::2] = map(int, parts[1::2])
    return parts

list_of_files = glob.glob("./totalenergies*")

for file in sorted(list_of_files, key=numericalSort):
    infiles =  open(file, "r")
    outfile = open("t-range","a")

    contents = [""]
    columns = []

    counter = 0

    for line in infiles:
        counter += 1
        contents.append(float(line.split()[1]))

        contents = ([float(i) for i  in contents])
        s = sum(contents)
        average = s / len(contents)
        columns.append(average)

        counter1 = 0
        for line in columns:
            counter1 += 1
            outfile.write("\n".join(map(str,columns)) + "\n")


infiles.close()
outfile.close()

This last part give an error of could not convert string to float - the traceback shows there is a problem with contents = ([float(i) for i in contents])

Solution

This is the line that is giving you the error:

outfile.write("".join(map(str,columns)+"\n"))

When you read the Traceback error, the last part will usually show you the line number in your script that generated the issue, so you should check that first.

The way the line currently reads, +"\n" is part of the join function, when it should be part of the write method:

outfile.write("".join(map(str,columns)) + "\n")

If your intention was to write each average from columns on a new line, and also insert a new line at the end of the list, you need this:

outfile.write("\n".join(map(str,columns)) + "\n")

More: http://docs.python.org/2/library/stdtypes.html#str.join

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow