Question

I have the following vector in R. Think of them as a vector of numbers.

x = c(1,2,3,4,...100)

I want to randomize this vector "locally" based on some input number the "locality factor". For example if the locality factor is 3, then the first 3 elements are taken and randomized followed by the next 3 elements and so on. Is there an efficient way to do this? I know if I use sample, it would jumble up the whole array. Thanks in advance

Was it helpful?

Solution 2

General solution:

Edit: As @MatthewLundberg comments, the issue I pointed out with "repeating numbers in x" can be easily overcome by working on seq_along(x), which would mean the resulting values will be indices. So, it'd be like so:

k <- 3
x <- c(2,2,1, 1,3,4, 4,6,5, 3)
x.s <- seq_along(x)
y <- sample(x.s)
x[unlist(split(y, (match(y, x.s)-1) %/% k), use.names = FALSE)]
# [1] 2 2 1 3 4 1 4 5 6 3

Old answer:

The bottleneck here is the amount of calls to function sample. And as long as your numbers don't repeat, I think you can do this with just one call to sample in this manner:

k <- 3
x <- 1:20
y <- sample(x)
unlist(split(y, (match(y,x)-1) %/% k), use.names = FALSE)
# [1]  1  3  2  5  6  4  8  9  7 12 10 11 13 14 15 17 16 18 19 20

To put everything together in a function (I like the name scramble from @Roland's):

scramble <- function(x, k=3) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    idx <- unlist(split(y.s, (match(y.s, x.s)-1) %/% k), use.names = FALSE)
    x[idx]
}

scramble(x, 3)
# [1] 2 1 2 3 4 1 5 4 6 3
scramble(x, 3)
# [1] 1 2 2 1 4 3 6 5 4 3

To reduce the answer (and get it faster) even more, following @flodel's comment:

scramble <- function(x, k=3L) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    x[unlist(split(x.s[y.s], (y.s-1) %/% k), use.names = FALSE)]
}

OTHER TIPS

Arun didn't like how inefficient my other answer was, so here's something very fast just for him ;)

It requires just one call each to runif() and order(), and doesn't use sample() at all.

x <- 1:100
k <- 3
n <- length(x)

x[order(rep(seq_len(ceiling(n/k)), each=k, length.out=n) + runif(n))]
#  [1]   3   1   2   6   5   4   8   9   7  11  12  10  13  14  15  18  16  17
# [19]  20  19  21  23  22  24  27  25  26  29  28  30  33  31  32  36  34  35
# [37]  37  38  39  40  41  42  43  44  45  47  48  46  51  49  50  52  54  53
# [55]  55  57  56  58  60  59  62  63  61  66  64  65  68  67  69  71  70  72
# [73]  75  74  73  76  77  78  81  80  79  84  82  83  86  85  87  89  88  90
# [91]  93  92  91  94  96  95  97  98  99 100

For the record, the boot package (shipped with base R) includes a function permutation.array() that is used for just this purpose:

x <- 1:100
k <- 3
ii <- boot:::permutation.array(n = length(x), 
                               R = 2, 
                               strata = (seq_along(x) - 1) %/% k)[1,]
x[ii]
#   [1]   2   1   3   6   5   4   9   7   8  12  11  10  15  13  14  16  18  17
#  [19]  21  19  20  23  22  24  26  27  25  28  29  30  33  31  32  36  35  34
#  [37]  38  39  37  41  40  42  43  44  45  46  47  48  51  50  49  53  52  54
#  [55]  57  55  56  59  60  58  63  61  62  65  66  64  67  69  68  72  71  70
#  [73]  75  73  74  76  77  78  79  80  81  82  83  84  86  87  85  89  88  90
#  [91]  93  91  92  94  95  96  97  98  99 100

This will drop elements at the end (with a warning):

locality <- 3
x <- 1:100
c(apply(matrix(x, nrow=locality, ncol=length(x) %/% locality), 2, sample))
## [1]  1  2  3  4  6  5  8  9  7 12 10 11 13 15 14 16 18 17 19 20 21 22 24 23 26 25 27 28 30 29 32 33 31 35 34 36 38 39 37
## [40] 42 40 41 43 44 45 47 48 46 51 49 50 54 52 53 55 57 56 58 59 60 62 61 63 64 65 66 67 69 68 71 72 70 74 75 73 78 77 76
## [79] 80 81 79 83 82 84 87 85 86 88 89 90 92 93 91 96 94 95 99 98 97
v <- 1:16

scramble <- function(vec,n) {
  res <- tapply(vec,(seq_along(vec)+n-1)%/%n,
                FUN=function(x) x[sample.int(length(x), size=length(x))])
  unname(unlist(res))
}

set.seed(42)
scramble(v,3)
#[1]  3  2  1  6  5  4  9  7  8 12 10 11 15 13 14 16

scramble(v,4)
#[1]  2  3  1  4  5  8  6  7 10 12  9 11 14 15 16 13

I like Matthew's approach way better but here was the way I did the problem:

x <- 1:100
fact <- 3

y <- ceiling(length(x)/fact)

unlist(lapply(split(x, rep(1:y, each =fact)[1:length(x)]), function(x){
    if (length(x)==1) return(x)
    sample(x)
}), use.names = FALSE)

##   [1]   3   1   2   6   4   5   8   9   7  11  10  12  13  15  14  17  16  18
##  [19]  20  21  19  24  23  22  26  27  25  29  30  28  31  32  33  35  34  36
##  [37]  39  37  38  41  42  40  45  43  44  47  46  48  51  49  50  52  53  54
##  [55]  57  56  55  59  60  58  63  62  61  64  66  65  67  68  69  70  71  72
##  [73]  75  73  74  77  76  78  80  79  81  82  84  83  85  86  87  90  89  88
##  [91]  92  91  93  96  94  95  98  99  97 100
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top