Question

Problem: I have n, m-length vectors in a n by m matrix. These vectors are left-padded with NA values.

Example:

x = matrix( 1:12, ncol=4 )
x[lower.tri(x)] = NA
print(x)
#      [,1] [,2] [,3] [,4]
# [1,]    1    4    7   10
# [2,]   NA    5    8   11
# [3,]   NA   NA    9   12

Question: What is an efficient way to make the rows right-padded? My actual matrix is 4,000 by 25,000.

What I want:

y = matrix( c( 1, 5, 9, 4, 8, 12,
               7, 11, NA, 10, NA, NA ), ncol=4 )
print(y)
#      [,1] [,2] [,3] [,4]
# [1,]    1    4    7   10
# [2,]    5    8   11   NA
# [3,]    9   12   NA   NA
Was it helpful?

Solution

Here are two one-line solutions:

t(apply(x, 1, FUN=function(ii) c(ii[!is.na(ii)],ii[is.na(ii)])))

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    5    8   11   NA
[3,]    9   12   NA   NA

matrix(apply(x, 1, FUN=function(ii) c(ii[!is.na(ii)],ii[is.na(ii)])),
 byrow=T,ncol=4)

The idea here is just to look through each row and find the NA sand move them behind the values that are not NAs (i.e. !is.na).

The second version is actually slightly faster on my machine:

library(microbenchmark)
microbenchmark(
t(apply(x, 1, FUN=function(ii) c(ii[!is.na(ii)],ii[is.na(ii)]))),
matrix(apply(x, 1, FUN=function(ii) c(ii[!is.na(ii)],ii[is.na(ii)])),
 byrow=T,ncol=4)
) 

Unit: microseconds

    min     lq  median     uq     max neval
 58.159 61.152 62.2215 66.711 174.475   100
 51.317 53.883 54.7380 57.731 127.863   100
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top