Question

I noticed that a lot of R hackers do something like this:

> matrix(c(1,2,3,4,5),nrow=5,ncol=10,byrow=FALSE)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    1    1    1     1
[2,]    2    2    2    2    2    2    2    2    2     2
[3,]    3    3    3    3    3    3    3    3    3     3
[4,]    4    4    4    4    4    4    4    4    4     4
[5,]    5    5    5    5    5    5    5    5    5     5

Basically, if the size of a vector(in this case 5) is shorter than the size of the "container" that it is put into, in this case, a matrix(with the size of 5 x 10 = 50), it will repeat itself until it fills the container. I think this is absolutely neat feature of R, which makes a lot of R codes very succinct. Is there a name for this? and a documentation about this?

I noticed this pattern from the following code snippet from(http://training.bioinformatics.ucdavis.edu/docs/2012/05/DAV/lectures/gene-expression-analysis/gene-expression-analysis.pdf). The function basically takes the data matrix and performs a quantile normalization

quan.norm<-function(x,quan=0.5){
  ##x: p by n data matrix, where columns are the samples.
  norm<-x
  p<-nrow(x)
  n<-ncol(x)
  x.sort<-apply(x, 2, sort) ## sort genes within a sample
  x.rank<-apply(x,2,rank) ## rank genes within a sample
  ## find the common distribution to be matched to:
  qant.sort<-matrix(apply(x.sort,1,quantile, probs=quan),
                    + p,n,byrow=FALSE) #***<----- HERE ***

  ## match each sample to the common distribution:
  for (i in 1:n){
    norm[,i]<-qant.sort[x.rank[,i],i]
  }
  return(norm)
}

I added * in the comment to see where this pattern occurs. I was shocked by the succinctness of the implementation of rather involved algorithm

Was it helpful?

Solution

As it was mentioned in comments it is called recycling rule.

From R Intro:

Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated.

Check this link R manual

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top