Вопрос

I am reading data (pixel values to be exact) from a h5 file and plotting the data in a histogram using numpy. In the array of my pixel values I have my no-data value which is 99999 (the range of my data is otherwise -40 to 20). Im creating a histogram based on a min and max that I set manually (-40 and 20 respectively) so the no-data value doesn't show up in my histogram - which is fine. However, I want to fit a normal curve over my data and for this I need the mean and SD of the dataset. When I generate these with numpy.mean and numpy.std it includes the no-data value so my mean and SD values are way off and my subsequent normal curve is too.

Essentially, Is there a way to generate the mean and sd from an array, ignoring a given value (i.e. my no-data value: 99999) or alternatively output the values of my array to a new array without the no-data value?

Thanks

Это было полезно?

Решение

Sounds like you should be storing your data in a masked array instead of this hacky method with 99999 no-data value. Start looking in np.ma.

Simple example:

>>> a = np.array([1, 2, 99999, 3])
>>> a.mean()
25001.25
>>> a_ = np.ma.masked_array(a, a == 99999)
>>> a_.mean()
2.0
>>> a_
masked_array(data = [1 2 -- 3],
             mask = [False False  True False],
       fill_value = 999999)

Другие советы

Is that OK for you to go through the data first, and save the useful data in another list (or any other structure you use), then process the new list with useful data only?

Or try this solution, How to count values in a certain range in a Numpy array?

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top