Pregunta

I'm very new to python and numpy, so sorry if I misuse some terminology.

I have converted a raster to a 2D numpy array in the hopes of doing calculations on it quickly and efficiently.

  • I need to get the cumulative sum across a numpy array such that, for each value, I generate the sum of all values that are less than or equal to that, and write that value to a new array. I need to cycle through the entire array that way.

  • I also need to scale the output between 1 and 100, but that seems
    more straightforward.

Attempt to clarify by example:

array([[ 4,  1  ,  3   ,  2] dtype=float32)

I would want the output values (just doing first row by hand) to read:

array([[ 10,  1  ,  6   ,  3], etc.

Any ideas on how to do that?

Thanks in advance!


The near finished script for anyone who's interested:

#Generate Cumulative Thresholds
#5/15/14

import os
import sys
import arcpy
import numpy as np

#Enable overwriting output data
arcpy.env.overwriteOutput=True

#Set working directory
os.chdir("E:/NSF Project/Salamander_Data/Continuous_Rasters/Canadian_GCM/2020/A2A/")

#Set geoprocessing variables
inRaster = "zero_eurycea_cirrigera_CA2A2020.tif"
des = arcpy.Describe(inRaster)
sr = des.SpatialReference
ext = des.Extent
ll = arcpy.Point(ext.XMin,ext.YMin)

#Convert GeoTIFF to numpy array
a = arcpy.RasterToNumPyArray(inRaster)

#Flatten for calculations
a.flatten()

#Find unique values, and record their indices to a separate object
a_unq, a_inv = np.unique(a, return_inverse=True)

#Count occurences of array indices
a_cnt = np.bincount(a_inv)

#Cumulatively sum the unique values multiplied by the number of
#occurences, arrange sums as initial array
b = np.cumsum(a_unq * a_cnt)[a_inv]

#Divide all values by 10 (reverses earlier multiplication done to
#facilitate accurate translation of ASCII scientific notation
#values < 1 to array)
b /= 10

#Rescale values between 1 and 100
maxval = np.amax(b)
b /= maxval
b *= 100

#Restore flattened array to shape of initial array
c = b.reshape(a.shape)

#Convert the array back to raster format
outRaster = arcpy.NumPyArrayToRaster(c,ll,des.meanCellWidth,des.meanCellHeight)

#Set output projection to match input
arcpy.DefineProjection_management(outRaster, sr)

#Save the raster as a TIFF
outRaster.save("C:/Users/mkcarte2/Desktop/TestData/outRaster.tif")

sys.exit()
¿Fue útil?

Solución

Depending on how you want to handle repeats, this could work:

In [40]: a
Out[40]: array([4, 4, 2, 1, 0, 3, 3, 1, 0, 2])

In [41]: a_unq, a_inv = np.unique(a, return_inverse=True)

In [42]: a_cnt = np.bincount(a_inv)

In [44]: np.cumsum(a_unq * a_cnt)[a_inv]
Out[44]: array([20, 20,  6,  2,  0, 12, 12,  2,  0,  6], dtype=int64)

Where of course a is your array flattened, that you would then have to reshape to the original shape.


And of course once numpy 1.9 is out, you can condense lines 41 and 42 above into the single, faster:

a_unq, a_inv, a_cnt = np.unique(a, return_inverse=True, return_counts=True)

Otros consejos

Edit:

This is ugly, but I think it finally works:

import numpy as np

def cond_cum_sum(my_array):
    my_list = []
    prev = -np.inf
    prev_sum = 0
    for ele in my_array:
        if prev != ele:
            prev_sum += ele
        my_list.append(prev_sum)
        prev = ele
    return np.array(my_list)

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

flat_a = a.flatten()
flat_a.sort() 

temp = np.argsort(a.ravel())   

cum_sums = cond_cum_sum(flat_a)

result_1 = np.zeros(len(flat_a))
result_1[temp] = cum_sums

result = result_1.reshape(a.shape)

Result:

>>> result
array([[  9.,   2.,   2.,   5.],
       [ 23.,   0.,  14.,   2.]])

With less numpy and more python:

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

np.array([[sum(x for x in arr if x <= subarr) for subarr in arr] for arr in a])
# array([[ 11.,   4.,   4.,   7.],
#        [ 16.,   0.,   7.,   2.]])

If the sum only considers items once no matter how much they appear, then,

np.array([[sum(set(x for x in arr if x <= subarr)) for subarr in arr] for arr in a]) 
# array([[  9.,   2.,   2.,   5.],
#        [ 16.,   0.,   7.,   2.]])

How about this:

a=np.array([ 4,  1  ,  3   ,  2])

np.array([np.sum(a[a<=x])for x in a])

Gives

array([10,  1,  6,  3])

For a 2D array (assuming you want the sum of the whole array, not just the row):

a=np.array([[ 4,  1  ,  3   ,  2],[ 5,  1  ,  3   ,  2]])

np.array([[np.sum(a[a<=x])for x in a[y,:]]for y in range(a.shape[0])])

Gvies

array([[16,  2, 12,  6],
       [21,  2, 12,  6]])
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top