Question

Note: I'm using Python and numpy arrays.

I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.

I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.

I'd like it to sort like so:

105.  4.
22.   10.
104.  26.
...
...
...
53.   520.
745.  902.
184.  nan
19.   nan

First I tried using fix_invalid which converts the NaNs into 1x10^20:

#data.txt has one of the arrays with 2 columns and a bunch of rows.
Data_0_30 = array(genfromtxt(fname='data.txt'))

g = open("iblah.txt", "a") #saves to file

def Sorted_i_M_W(mass):
    masked = ma.fix_invalid(mass)
    print  >> g, array(sorted(masked, key=itemgetter(1)))

Sorted_i_M_W(Data_0_30)

g.close()

Or I replaced the function with something like this:

def Sorted_i_M_W(mass):
    sortedmass = sorted( mass, key=itemgetter(1))
    print  >> g, array(sortedmass)

For each attempt I got something like:

...
[  4.46800000e+03   1.61472200e+11]
[  3.72700000e+03   1.74166300e+11]
[  4.91800000e+03   1.75502300e+11]
[  6.43500000e+03              nan]
[  3.95520000e+04   8.38907500e+09]
[  3.63750000e+04   1.27625700e+10]
[  2.08810000e+04   1.28578500e+10]
...

Where at the location of the NaN value, the sorting re-starts again.

(For the fix_invalid the NaN in the above excerpt shows a 1.00000000e+20 value). But I'd like the sorting to ignore the NaN value completely.

What's the easiest way to sort this array the way I want?

Était-ce utile?

La solution

Not sure if it can be done with numpy.sort, but you can use numpy.argsort for sure:

>>> arr
array([[ 105.,    4.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan],
       [  22.,   10.],
       [ 104.,   26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105.,    4.],
       [  22.,   10.],
       [ 104.,   26.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan]])

Autres conseils

You can create a masked array:

a = np.loadtxt('test.txt')

mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)

And then sort a using the masked array:

a[np.argsort(ma[:, 1])]

If you're using an older version of numpy and don't want to upgrade (or if you want code that supports older versions of numpy) you can do:

import numpy as np

def nan_argsort(a):
    temp = a.copy()
    temp[np.isnan(a)] = np.inf
    return temp.argsort()

sorted = a[nan_argsort(a[:, 1])]

In newer versions of numpy, at least 1.6 I think, numpy's sort/argsort already has this behavior. If you need to use python's sort for some reason, you can make your own compare function as described in the other answers.

You can use comparision function

def cmpnan(x, y):
    if isnan(x[1]):
        return 1 # x is "larger"
    elif isnan(y[1]):
        return -1 # x is "smaller"
    else:
        cmp(x[1], y[1]) # compare numbers

sorted(data, cmp=cmpnan)

see http://docs.python.org/2.7/library/functions.html#sorted

if you really don't want to use numpy array, you could sort the second column, then get the index to call you array.

it can be done in one line only like this:

yourarray[sorted(range(len(yourarray[:,1])), key=lambda k: yourarray[:,1][k])]
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top