Question

I've got a numpy array (actually a pandas Data Frame, but the array will do) whose values I would like to permute. The catch is that there are a number of non-randomly positioned NaN's that I'd need to keep in place. So far I have an iterative solution involving populating a list of indices, making a permuted copy of that list and then assigning values from the original matrix from the original index to the permuted index. Any suggestions on how to do this more quickly? The matrix has millions of values and optimally I'd like to do many permutations but it's prohibitively slow with the iterative solution.

Here's the iterative solution:

import numpy, pandas

df = pandas.DataFrame(numpy.random.randn(3,3), index=list("ABC"), columns=list("abc"))
df.loc[[0,2], "a"] = numpy.nan
indices = []

for row in df.index:
    for col in df.columns:
        if not numpy.isnan(df.loc[row, col]):
            indices.append((row, col))

permutedIndices = numpy.random.permutation(indices)
permuteddf = pandas.DataFrame(index=df.index, columns=df.columns)
for i in range(len(indices)):
    permuteddf.loc[permutedIndices[i][0], permutedIndices[i][1]] = df.loc[indices[i][0], indices[i][1]]

With results:

In [19]: df
Out[19]: 
         a         b         c
A      NaN  0.816350 -1.187731
B -0.58708 -1.054487 -1.570801
C      NaN -0.290624 -0.453697

In [20]: permuteddf
Out[20]: 
          a          b          c
A       NaN  -0.290624  0.8163501
B -1.570801 -0.4536974  -1.054487
C       NaN -0.5870797  -1.187731
Was it helpful?

Solution

How about:

>>> df = pd.DataFrame(np.random.randn(5,5))
>>> df[df < 0.1] = np.nan
>>> df
          0         1         2         3         4
0       NaN  1.721657  0.446694       NaN  0.747747
1  1.178905  0.931979       NaN       NaN       NaN
2  1.547098       NaN       NaN       NaN  0.225014
3       NaN       NaN       NaN  0.886416  0.922250
4  0.453913  0.653732       NaN  1.013655       NaN

[5 rows x 5 columns]
>>> movers = ~np.isnan(df.values)
>>> df.values[movers] = np.random.permutation(df.values[movers])
>>> df
          0         1         2         3         4
0       NaN  1.013655  1.547098       NaN  1.721657
1  0.886416  0.446694       NaN       NaN       NaN
2  1.178905       NaN       NaN       NaN  0.453913
3       NaN       NaN       NaN  0.747747  0.653732
4  0.922250  0.225014       NaN  0.931979       NaN

[5 rows x 5 columns]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top