Append rle result from loop

https://stackoverflow.com/questions/12892985

07-07-2021
|

質問

I am running a coin-toss simulation with a loop which runs about 1 million times.

Each time I run the loop I wish to retain the table output from the RLE command. Unfortunately a simple append does not seem to be appropriate. Each time I run the loop I get a slightly different amount of data which seems to be one of the sticking points.

This code gives an idea of what I am doing:

N <- 5 #Number of times to run
rlex <-NULL
#begin loop#############################
for (i in 1:N) { #tells R to repeat N number
x <-sample(0:1, 100000, 1/2)
rlex <-append(rlex, rle(x))
}
table(rlex) #doesn't work
table(rle(x)) #only 1

So instead of having five separate rle results (in this simulation, 1 million in the full version), I want one merged rle table. Hope this is clear. Obviously my actual code is a bit more complex, hence any solution should be as close to what I have specified as possible.

UPDATE: The loop is an absolute requirement. No ifs or buts. Perhaps I can pull out the table(rle(x)) data and put it into a matrix. However again the stumbling block is the fact that some of the less frequent run lengths do not always turn up in each loop. Thus I guess I am looking to conditionally fill a matrix based on the run length number?

Last update before I give up: Retaining the rle$values will mean that too much data is being retained. My simulation is large-scale and I really only wish to retain the table output of the rle. Either I retain each table(rle(x)) for each loop and combine by hand (there will be thousands), or I find a programmatic way to keep the data (yes for zeroes and ones) and have one table that is formed from merging each of the individual loops as I go along.

Either this is easyish to do, as specified, or I will not be doing it. It may seem a silly idea/request, but that should be incidental to whether it can be done.

Seriously last time. Here is an animated gif showing what I expect to happen. enter image description here

After each iteration of the loop data is added to the table. This is as clear as I am going to be able to communicate it.

解決

OK, attempt number 4:

N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
  x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
}

x <- as.data.frame(x)
x$length <- as.numeric(rownames(x))
aggregate(x[, 1:2], list(x[[3]]), sum)

Produces:

   Group.1     0     1
1        1 62634 62531
2        2 31410 31577
3        3 15748 15488
4        4  7604  7876
5        5  3912  3845
6        6  1968  1951
7        7   979   971
8        8   498   477
9        9   227   246
10      10   109   128
11      11    65    59
12      12    24    30
13      13    21    11
14      14     7    10
15      15     0     4
16      16     4     2
17      17     0     1
18      18     0     1

If you want the aggregation inside the loop, do:

N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
  x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
  y <- aggregate(x, list(as.numeric(rownames(x))), sum)
  print(y)
}

他のヒント

Following up @CarlWitthoft's answer, you probably want:

N <- 5
rlex <-NULL
for (i in 1:N) {
    x <-sample(0:1, 100000, 1/2)
    rlex <-append(rlex, rle(x)$lengths)
}

since I think you don't care about the $values component (i.e. whether each run is a run of zeros or ones).

Result: one long vector of run lengths.

But this would probably be a lot more efficient:

maxlen <- 30
rlemat <- matrix(nrow=N,ncol=maxlen)
for (i in 1:N) { 
    x <-sample(0:1, 100000, 1/2)
    rlemat[i,] <- table(factor(rle(x)$lengths,levels=1:maxlen))
}

Result: an N by maxlen table of run lengths from each iteration.

If you only want to save the total number of runs of each length you could try:

rlecumsum <- rep(0,maxlen)
for (i in 1:N) { 
    x <-sample(0:1, 100000, 1/2)
    rlecumsum <- rlecumsum + table(factor(rle(x)$lengths,levels=1:maxlen))
}

Result: an vector of length maxlen of the total numbers of run lengths across all iterations.

And here's my final answer:

rlecumtab <- matrix(0,ncol=2,nrow=maxlen)
for (i in 1:N) { 
   x <- sample(0:1, 100000, 1/2)
   r1 <- rle(x)
   rtab <- table(factor(r1$lengths,levels=1:maxlen),r1$values)
   rlecumtab <- rlecumtab + rtab
}

Result: a maxlen by 2 table of the total numbers of run lengths across all iterations, divided by type (0-run vs 1-run).

You need to read the help page for rle . Consider:

names(rlex)  #"lengths"  "values"  "lengths"  "values" .... and so on

In the meantime, I strongly suggest you spend some time reading up on statistical methods. There is zero (+/- epsilon) chance that running a binomial simulation a million times will tell you anything you won't learn after a few hundred tries, unless your coin has p=1e-5 :-).

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow