Depending on how you want to handle repeats, this could work:
In [40]: a
Out[40]: array([4, 4, 2, 1, 0, 3, 3, 1, 0, 2])
In [41]: a_unq, a_inv = np.unique(a, return_inverse=True)
In [42]: a_cnt = np.bincount(a_inv)
In [44]: np.cumsum(a_unq * a_cnt)[a_inv]
Out[44]: array([20, 20, 6, 2, 0, 12, 12, 2, 0, 6], dtype=int64)
Where of course a
is your array flattened, that you would then have to reshape to the original shape.
And of course once numpy 1.9 is out, you can condense lines 41 and 42 above into the single, faster:
a_unq, a_inv, a_cnt = np.unique(a, return_inverse=True, return_counts=True)