def linereader(fName):
with open(fName) as f:
for line in f:
yield map(float, line.split())
result = []
data = []
for i, line in enumerate(linereader('values.txt')):
data.append(line)
if i == 0:
result.append(data[0][0])
elif i == 1:
result.append((data[0][1] + data[1][0])/2)
else:
result.append((data[i-2][2] + data[i-1][1] + data[i][0])/3)
result.append((data[-2][2] + data[-2][1])/2)
result.append(data[-1][2])
print result
#[0.6825, 0.66075, 0.6455666666666667, 0.6171333333333333, 0.5222333333333333, 0.3865, 0.22266666666666668, 0.16313333333333332, 0.12685000000000002, 0.174]
Python: Shifted Averaging And Space Delimiting
Question
I'm trying to figure out the average for a set of predictions (values between 0 and 1 that my model has created. The prediction values are shifted, however, such that if there were 10 data values the predictions are made in sets of three like so:
0.6825 0.7022 0.7023
0.6193 0.6410 0.6389
0.5934 0.6159 0.6145
0.5966 0.6191 0.6184
0.3331 0.3549 0.3500
0.1862 0.2015 0.1999
0.1165 0.1270 0.1267
0.1625 0.1761 0.1740
Wherein these values correspond to the indexes as follows:
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
I'm trying to write a script such that I read the input line by line and output an average of the prediction values as appropriate for each index (average the two twos together, the three threes, the three fours, etc) such that there is one per line:
0.6825
0.7001
etc...
However, I'm not sure how to read in the data such that I'm able to differentiate between 0.6825 and 0.7022 in line 1, for example. How do I read in the information such that I'm able to store it all in a contiguous array?
As far as logic goes, I figure that I can have special cases for the first, second, last, and second-to-last values, and run a loop for the rest.
La solution
Autres conseils
This is how I would go about it:
- First aggregate the values into
index:values
dictionary (using adefaultdict
) - Then create a new dictionary with the averages average the values in that dictionary
For example:
from collections import defaultdict
from pprint import pprint
aggregated = defaultdict(list)
with open('predictions.txt') as data:
for i, line in enumerate(data, start=1):
values = [float(v) for v in line.split()]
for offset, value in enumerate(values):
index = i + offset
aggregated[index].append(value)
averaged = {}
for index, values in aggregated.items():
averaged[index] = sum(values) / float(len(values))
pprint(averaged)
Output:
{1: 0.6825,
2: 0.66075,
3: 0.6455666666666667,
4: 0.6171333333333333,
5: 0.5222333333333333,
6: 0.3865,
7: 0.22266666666666668,
8: 0.16313333333333332,
9: 0.1514,
10: 0.174}
Also see docs for defaultdict
and enumerate