Question

Suppose I have an N*M*X-dimensional array "data", where N and M are fixed, but X is variable for each entry data[n][m].

(Edit: To clarify, I just used np.array() on the 3D python list which I used for reading in the data, so the numpy array is of dimensions N*M and its entries are variable-length lists)

I'd now like to compute the average over the X-dimension, so that I'm left with an N*M-dimensional array. Using np.average/mean with the axis-argument doesn't work, so the way I'm doing it right now is just iterating over N and M and appending the manually computed average to a new list, but that just doesn't feel very "python":

avgData=[]
for n in data:
    temp=[]
    for m in n:
        temp.append(np.average(m))
    avgData.append(temp)

Am I missing something obvious here? I'm trying to freshen up my python skills while I'm at it, so interesting/varied responses are more than welcome! :)

Thanks!

Was it helpful?

Solution

What about using np.vectorize:

do_avg = np.vectorize(np.average)
data_2d = do_avg(data)

OTHER TIPS

data = np.array([[1,2,3],[0,3,2,4],[0,2],[1]]).reshape(2,2)
avg=np.zeros(data.shape)
avg.flat=[np.average(x) for x in data.flat]
print avg
#array([[ 2.  ,  2.25],
#       [ 1.  ,  1.  ]])

This still iterates over the elements of data (nothing un-Pythonic about that). But since there's nothing special about the shape or axes of data, I'm just using data.flat. While appending to Python list, with numpy it is better to assign values to the elements of an existing array.

There are fast numeric methods to work with numpy arrays, but most (if not all) work with simple numeric dtypes. Here the array elements are object (either list or array), numpy has to resort to the usual Python iteration and list operations.

For this small example, this solution is a bit faster than Zwicker's vectorize. For larger data the two solutions take about the same time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top