selecting two random numbers via bootstrapping

https://stackoverflow.com/questions/17198581

01-06-2022
|

Pergunta

I have a dataset of 1020 size measurements. I would need to create a new dataset based on these 1020 numbers, by randomly taking out numbers with replacement. However, I need to do this random sampling in the following way:

Taking out randomly two numbers from the original dataset.
Selecting the number that is larger of these two random numbers.
Getting this larger number into the new dataset.
Repeating steps 1-3 that many times that I have a new dataset with 1020 sizes (like in the original dataset), and that I have in total 10000 new datasets with 1020 sizes.

I do manage to create 10000 new datasets based on the original dataset by randomly picking out numbers from the original dataset with bootstrapping method:

a <- numeric(10000)
for(i in 1:10000) a[i] <- sample(size, replace = T)

But I do not know, how to use this command above to get two random numbers, selecting the bigger one, and having this bigger one in new dataset.

Could it be something following?

b <- numeric(10000)
for(i in 1:10000) b[i] <- sample(size, 2, ......, replace = T))

And then have some command (which I do not know) there were the dots are to get bigger number out of two into new datasets?

Solução

I think this might do what you want. y1 will contain all of your first draws in a pair and y2 will contain all of the second. the pmax function takes the larger of each of these and the matrix command puts the data into a matrix with 1020 rows and 10000 columns. You might want to replace some of these 'magic' numbers with variables in your script so that you can easily try small samples for testing purposes.

y1 <- sample(data, 1020 * 10000, replace = TRUE)
y2 <- sample(data, 1020 * 10000, replace = TRUE)

bigDat <- matrix( pmax(y1, y2), nrow = 1020)

Outras dicas

I'm having a hard time imagining why you would want to do this, but ... here is an example on a much smaller scale. I created some fake data, df, with 10 measurements and generated 3 bootstrap samples as you describe. In real life you would replace df with your real data frame of 1020 measurements and set nboots equal to 10000.

# fake data
df <- data.frame(meas=rnorm(n))
# number of bootstrap samples you want
nboots <- 3

# number of rows in fake data
n <- dim(df)[1]
# array of initial double bootstrap sample
init <- array(sample(1:n, n*2*nboots, replace=TRUE), dim=c(n, nboots, 2))
# keep only the bigger measurement from each pair of bootstrap samples
bootmeas <- matrix(pmax(df$meas[init[, , 1]], df$meas[init[, , 2]]), nrow=n)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow