Question

I will explain my issue using an example:

A=[[1,2,10],[1,2,10],[3,4,5]]
B=[[1,2,30],[6,7,9]]

From these lists of lists, i would like to create a third one:

C=A+B

So i get :

C= [[1, 2, 10], [1, 2, 10], [3, 4, 5], [1, 2, 30], [6, 7, 9]]

Notice that there are three lists inside C , the [1, 2, 10], [1, 2, 10], [1, 2, 30] lists, which if described in terms of [x,y,z], they have the same x,y but different z.

So i would like to have this new list:

Averaged= [(1, 2, 16.666), (6, 7, 9), (3, 4, 5)]

where we find only one occurrence of the same x,y from lists

[1, 2, 30], [1, 2, 40], [1, 2, 50]

and the average of the corresponding z values (10+10+30)/3=16.666

I tried using for loops at the beginning but ended up trying to do this using defaultdict.

I ended up with this that keeps once the (x,y) but adds and not averages the corresponding z values:

from collections import defaultdict
Averaged=[]

A=[[1,2,10],[1,2,10],[3,4,5]]
B=[[1,2,30],[6,7,9]]
C=A+B
print "C=",C

ToBeAveraged= defaultdict(int)
for (x,y,z) in C:
    ToBeAveraged[(x,y)] += z
Averaged = [k + (v,) for k, v in ToBeAveraged.iteritems()]    

print 'Averaged=',Averaged

Is it possible to do this with defaultdict? Any ideas?

Was it helpful?

Solution

You'll need to sort the data first:

>>> C = sorted(A + B)
>>> def avg(x):
        return sum(x) / len(x)

>>> [[avg(i) for i in zip(*y)] for x,y in 
     itertools.groupby(C, operator.itemgetter(0,1))]
[[1.0, 2.0, 16.666666666666668], [3.0, 4.0, 5.0], [6.0, 7.0, 9.0]]

If you just want the groups before the average:

[list(y) for x,y in itertools.groupby(C, operator.itemgetter(0,1))]

OTHER TIPS

In your code you are not dividing by the number of observations. I changed your code around to collect all observations of a given pair (x, y), and then take an average of them. There should be a more efficient solution, but this should work.

from collections import defaultdict
Averaged=[]

A=[[1,2,10],[1,2,10],[3,4,5]]
B=[[1,2,30],[6,7,9]]
C=A+B
print "C=",C

def get_mean(x):
    return sum(ele for ele in x) / float(len(x))

ToBeAveraged= defaultdict(list)
for (x,y,z) in C:
    ToBeAveraged[(x,y)].append(z)
Averaged = [k + (get_mean(v),) for k, v in ToBeAveraged.iteritems()]    

print 'Averaged=',Averaged

Result:

C= [[1, 2, 10], [1, 2, 10], [3, 4, 5], [1, 2, 30], [6, 7, 9]]
Averaged= [(1, 2, 16.666666666666668), (6, 7, 9.0), (3, 4, 5.0)]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top