I'm trying to figure out the average for a set of predictions (values between 0 and 1 that my model has created. The prediction values are shifted, however, such that if there were 10 data values the predictions are made in sets of three like so:

0.6825 0.7022 0.7023
0.6193 0.6410 0.6389
0.5934 0.6159 0.6145
0.5966 0.6191 0.6184
0.3331 0.3549 0.3500
0.1862 0.2015 0.1999
0.1165 0.1270 0.1267
0.1625 0.1761 0.1740

Wherein these values correspond to the indexes as follows:

1   2   3
2   3   4
3   4   5
4   5   6
5   6   7
6   7   8
7   8   9
8   9   10

I'm trying to write a script such that I read the input line by line and output an average of the prediction values as appropriate for each index (average the two twos together, the three threes, the three fours, etc) such that there is one per line:

0.6825
0.7001
etc...

However, I'm not sure how to read in the data such that I'm able to differentiate between 0.6825 and 0.7022 in line 1, for example. How do I read in the information such that I'm able to store it all in a contiguous array?

As far as logic goes, I figure that I can have special cases for the first, second, last, and second-to-last values, and run a loop for the rest.

有帮助吗?

解决方案

def linereader(fName):
    with open(fName) as f:
        for line in f:
            yield map(float, line.split())


result = []
data = []
for i, line in enumerate(linereader('values.txt')):
    data.append(line)
    if i == 0:
        result.append(data[0][0])
    elif i == 1:
        result.append((data[0][1] + data[1][0])/2)
    else:
        result.append((data[i-2][2] + data[i-1][1] + data[i][0])/3)

result.append((data[-2][2] + data[-2][1])/2)
result.append(data[-1][2])

print result
#[0.6825, 0.66075, 0.6455666666666667, 0.6171333333333333, 0.5222333333333333, 0.3865, 0.22266666666666668, 0.16313333333333332, 0.12685000000000002, 0.174]

其他提示

This is how I would go about it:

  • First aggregate the values into index:values dictionary (using a defaultdict)
  • Then create a new dictionary with the averages average the values in that dictionary

For example:

from collections import defaultdict
from pprint import pprint

aggregated = defaultdict(list)

with open('predictions.txt') as data:
    for i, line in enumerate(data, start=1):
        values = [float(v) for v in line.split()]
        for offset, value in enumerate(values):
            index = i + offset
            aggregated[index].append(value)


averaged = {}

for index, values in aggregated.items():
    averaged[index] = sum(values) / float(len(values))

pprint(averaged)

Output:

{1: 0.6825,
 2: 0.66075,
 3: 0.6455666666666667,
 4: 0.6171333333333333,
 5: 0.5222333333333333,
 6: 0.3865,
 7: 0.22266666666666668,
 8: 0.16313333333333332,
 9: 0.1514,
 10: 0.174}

Also see docs for defaultdict and enumerate

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top