Domanda

I'm trying to see if numpy.histogram2d will cross tabulate data in 2 arrays for me. I've never used this function before and I'm getting an error I don't know how to fix.

import numpy as np
import random
zones = np.zeros((20,30), int)
values = np.zeros((20,30), int)
for i in range(20):
    for j in range(30):
        values[i,j] = random.randint(0,10)
zones[:8,:15] = 100
zones[8:,:15] = 101
zones[:8,15:] = 102
zones[8:,15:] = 103
np.histogram2d(zones,values)

This code results in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-53447df32000> in <module>()
----> 1 np.histogram2d(zones,values)

C:\Python27\ArcGISx6410.2\lib\site-packages\numpy\lib\twodim_base.pyc in histogram2d(x, y, bins, range, normed, weights)
    613         xedges = yedges = asarray(bins, float)
    614         bins = [xedges, yedges]
--> 615     hist, edges = histogramdd([x,y], bins, range, normed, weights)
    616     return hist, edges[0], edges[1]
    617 

C:\Python27\ArcGISx6410.2\lib\site-packages\numpy\lib\function_base.pyc in histogramdd(sample, bins, range, normed, weights)
    279         # Sample is a sequence of 1D arrays.
    280         sample = atleast_2d(sample).T
--> 281         N, D = sample.shape
    282 
    283     nbin = empty(D, int)

ValueError: too many values to unpack

Here is what I am trying to accomplish:

I have 2 arrays. One array comes from a geographic dataset (raster) representing Landcover classes (e.g. 1=Tree, 2=Grass, 3=Building, etc.). The other array comes from a geographic dataset (raster) representing some sort of political boundary (e.g. parcels, census blocks, towns, etc). I am trying to get a table that lists each unique political boundary area (array values represent a unique id) as rows and the total number of pixels within each boundary for each landcover class as columns.

È stato utile?

Soluzione

I'm assuming values is the landcover and zones is the political boundaries. You might want to use np.bincount, which is like a special histogram where each bin has spacing and width of exactly one.

import numpy as np

zones = np.zeros((20,30), int)
zones[:8,:15] = 100
zones[8:,:15] = 101
zones[:8,15:] = 102
zones[8:,15:] = 103

values = np.random.randint(0,10,(20,30))  # no need for that loop

tab = np.array([np.bincount(values[zones==zone]) for zone in np.unique(zones)])

You can do this more simply with histogram, though, if you are careful with the bin edges:

np.histogram2d(zones.flatten(), values.flatten(), bins=[np.unique(zones).size, values.max()-values.min()+1])

The way this works is as follows. The easiest example is to look at all values regardless of zone:

np.bincount(values)

Which gives you one row with the counts for each value (0 to 10). The next step is to look at the zones. For one zone, you'd have just one row, and it would be:

zone = 101         # the desired zone
mask = zone==zones # a mask that is True wherever your zones map matches the desired zone
np.bincount(values[mask])  # count the values where the mask is True

Now, we just want to do this for each zone in the map. You can get a list of the unique values in your zones map with

zs = np.unique(zones)

and loop through it with a list comprehension, where each item is one of the rows as above:

tab = np.array([np.bincount(values[zones==zone]) for zone in np.unique(zones)])

Then, your table looks like this:

print tab
# elements with cover = 
#  0  1  2  3  4  5  6  7  8  9     # in zone:
[[16 11 10 12 13 15 11  7 13 12]    # 100
 [13 23 15 16 24 16 24 21 15 13]    # 101
 [10 12 23 13 12 11 11  5 11 12]    # 102
 [19 25 20 12 16 19 13 18 22 16]]   # 103

Finally, you can plot this in matplotlib as so:

import matplotlib.pyplot as plt

plt.hist2d(zones.flatten(), values.flatten(), bins=[np.unique(zones).size, values.max()-values.min()+1])

Altri suggerimenti

histogram2d expects 1D arrays as input, and your zones and values are 2D. You could linearize them with ravel:

np.histogram2d(zones.ravel(), values.ravel())

If efficiency isn't a concern, I think this works for what you want to do

from collections import Counter

c = Counter(zip(zones.flat[:], landcover_classes.flat[:]))

c will contain key/val tuples where the key is a tuple of (zone, landcover class). You can populate an array if you like with

for (i, j), count in c.items():
    my_table[i, j] = count

That only works, of course, if i and j are sequential integers starting at zero (i.e., from 0 to Ni and 0 to Nj).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top