calculations on big satellite image crashes python and computer freezes [duplicate]

https://stackoverflow.com//questions/10658262

11-12-2019
|

Question

Possible Duplicate:
Big Satellite Image Processing

I'm trying to run Mort Canty's Python iMAD implementation on bitemporal RapidEye Multispectral images. Which basically calculates the canonical correlation for the two images and then substracts them. The problem I'm having is that the images are of 5000 x 5000 x 5 (bands) pixels. When I run this on the whole image my computer crashes terribly and I have to turn it off.

Does anyone have an idea what can make python crash the computer like this? If I for example choose 2999x2999 pixels per band everything runs fine.

8 gbs ram, I7-2617M 1.5 1.5 ghz, Windows7 64 bits. Im using the 64 bit version of everything: python(2.7), numpy, scipy and gdal.

thank you in advance!

    def covw(dm,w):
    # weighted covariance matrix and means 
    # from (transposed) data array    
       N = size(dm,0) 
       n = size(w)
       sumw = sum(w)
       ws = tile(w,(N,1))
       means = mat(sum(ws*dm,1)/sumw).T
       means = tile(means,(1,n))
       dmc = dm - means
       dmc = multiply(dmc,sqrt(ws))
       covmat = dmc*dmc.T/sumw
       return (covmat,means)

    def main():
    # ------------test---------------------------------------------------------------    
    if len(sys.argv) == 1:        
    (sys.argv).extend(['-p','[0,1,2,3,4]','-  
    d','[0,4999,0,4999]',
'c://users//pythonxy//workspace//1uno.tif','c://users//pythonxy//workspace//2dos.tif'])
    # -------------------------------------------------------------------------------        

options, args = getopt.getopt(sys.argv[1:],'hp:d:')
pos = None
dims = None            
for option, value in options:
    if option == '-h':
        print 'Usage: python %s [-p "bandPositions" -d "spatialDimensions"] 
        filename1   filename2' %sys.argv[0]
        print '       bandPositions and spatialDimensions are quoted lists, 
        e.g., -p "[0,1,3]" -d "[0,400,0,400]"  \n'
        sys.exit(1) 
    elif option == '-p':
        pos = eval(value)
    elif option == '-d':
        dims = eval(value) 
if len(args) != 2:
    print 'Incorrect number of arguments'
    print 'Usage: python %s [-p "bandspositions" -d "spatialdimensions"] 
    filename1 filename2 \n' %sys.argv[0]
    sys.exit(1)                                    
gdal.AllRegister()
fn1 = args[0]
fn2 = args[1]
path = os.path.dirname(fn1)
basename1 = os.path.basename(fn1)
root1, ext = os.path.splitext(basename1)
basename2 = os.path.basename(fn2)
outfn = path+'\\MAD['+basename1+'-'+basename2+']'+ext
inDataset1 = gdal.Open(fn1,GA_ReadOnly)     
inDataset2 = gdal.Open(fn2,GA_ReadOnly)
cols = inDataset1.RasterXSize
rows = inDataset1.RasterYSize    
bands = inDataset1.RasterCount
cols2 = inDataset2.RasterXSize
rows2 = inDataset2.RasterYSize    
bands2 = inDataset2.RasterCount
if (rows != rows2) or (cols != cols2) or (bands != bands2):
    sys.stderr.write("Size mismatch")
    sys.exit(1)
if pos is None:
    pos = range(bands)
else:
    bands = len(pos) 
if dims is None:
    x0 = 0
    y0 = 0
else:
    x0 = dims[0]
    y0 = dims[2]  
    cols = dims[1]-dims[0] + 1  
    rows = dims[3]-dims[2] + 1                       
# initial weights
wt = ones(cols*rows)      
# data array (transposed so observations are columns)
dm = zeros((2*bands,cols*rows),dtype='float32')
k = 0
for b in pos:
    band1 = inDataset1.GetRasterBand(b+1)
    band1 = band1.ReadAsArray(x0,y0,cols,rows).astype(float)
    dm[k,:] = ravel(band1)
    band2 = inDataset2.GetRasterBand(b+1)
    band2 = band2.ReadAsArray(x0,y0,cols,rows).astype(float)        
    dm[bands+k,:] = ravel(band2)
    k += 1
print '========================='
print '       iMAD'
print '========================='
print 'time1: '+fn1
print 'time2: '+fn2   
print 'Delta    [canonical correlations]'   
# iteration of MAD        
delta = 1.0
oldrho = zeros(bands)
iter = 0
while (delta > 0.001) and (iter < 50):    
#     weighted covariance matrices and means 
    sigma,means = covw(dm,wt)          
    s11 = mat(sigma[0:bands,0:bands])
    s22 = mat(sigma[bands:,bands:]) 
    s12 = mat(sigma[0:bands,bands:])
    s21 = mat(sigma[bands:,0:bands])
#     solution of generalized eigenproblems
    s22i = mat(linalg.inv(s22))
    lama,a = linalg.eig(s12*s22i*s21,s11) 
    s11i = mat(linalg.inv(s11))    
    lamb,b = linalg.eig(s21*s11i*s12,s22) 
#     sort a   
    idx = argsort(lama)
    a = a[:,idx]
#     sort b         
    idx = argsort(lamb)
    b = b[:,idx]           
#     canonical correlations        
    rho = sqrt(real(lamb[idx]))             
#     normalize dispersions   
    a = mat(a)
    tmp1 = a.T*s11*a
    tmp2 = 1./sqrt(diag(tmp1))
    tmp3 = tile(tmp2,(bands,1))
    a = multiply(a,tmp3)
    b = mat(b) 
    tmp1 = b.T*s22*b
    tmp2 = 1./sqrt(diag(tmp1))
    tmp3 = tile(tmp2,(bands,1))
    b = multiply(b,tmp3)
#     assure positive correlation
    tmp = diag(a.T*s12*b)
    b = b*diag(tmp/abs(tmp))
#     canonical and MAD variates
    U = a.T*mat(dm[0:bands,:]-means[0:bands,:])    
    V = b.T*mat(dm[bands:,:]-means[bands:,:])           
    MAD = U-V  
#     new weights        
    var_mad = tile(mat(2*(1-rho)).T,(1,rows*cols))    
    chisqr = sum(multiply(MAD,MAD)/var_mad,0)
    wt = 1-stats.chi2.cdf(chisqr,[bands])
#     continue iteration         
    delta = sum(abs(rho-oldrho))
    oldrho = rho
    print delta
    iter += 1   
# write results to disk
driver = inDataset1.GetDriver()    
outDataset = driver.Create(outfn,cols,rows,bands+1,GDT_Float32)
projection = inDataset1.GetProjection()
geotransform = inDataset1.GetGeoTransform()
if geotransform is not None:
    gt = list(geotransform)
    gt[0] = gt[0] + x0*gt[1]
    gt[3] = gt[3] + y0*gt[5]
    outDataset.SetGeoTransform(tuple(gt))
if projection is not None:
    outDataset.SetProjection(projection)        
for k in range(bands):        
    outBand = outDataset.GetRasterBand(k+1)
    outBand.WriteArray(resize(MAD[k,:],(rows,cols)),0,0) 
    outBand.FlushCache()
outBand = outDataset.GetRasterBand(bands+1)    
outBand.WriteArray(resize(chisqr,(rows,cols)),0,0) 
outBand.FlushCache()    
outDataset = None
inDataset1 = None
inDataset2 = None  
print 'result written to: '+outfn
print '---------------------------------'

if name == 'main': main()

Solution

It sounds like this operation simply takes more memory than your computer can provide. This is an oversimplification, but when the system runs out of actual RAM to use, it sometimes writes sections of memory which appear to be being used less onto your hard disk, so it can use that real memory for something else. The hard disk is many orders of magnitude slower than main memory, so when your software needs parts of memory which have been written to the disk, everything can get real slow. When this happens at a dramatic scale, and parts of your software and operating system are constantly trying to use pieces of memory which have been swapped out (written to disk), your hard drive can get a major workout trying to seek back and forth, writing lots of stuff and reading lots of stuff and writing more stuff, etc. The system can become quite unresponsive in a situation like this.

You could see if this is really what's happening by watching your system's activity monitors (I forget what they're called on Windows, but I know they're there; some software that shows you how much memory is allocated, being used, etc, and draws a nice graph for you). While watching those, start up your program and watch what the memory allocation rate looks like.

There are probably some ways to alleviate memory usage in this code, if fewer things are kept in memory at a time, but I don't see offhand what they are. You could also add more RAM to your system in hopes that it would fix this.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow