Question

I am building code to run and manage simulations of sampling events at sites that can be in one of three site cohorts. I use rep() to assign the cohort identifier (1,2, or 3) using the code below:

cohort <- rep(1:n.cohorts, n.sites) 

I have put the key line first, although to reproduce my problem you need to run the following lines, which allocate a total number of sites between cohorts for presentation to the rep() call.

n.cohorts <- 3
s <- 10 # total available sites in this example

# different proportions of the total can be allocated to each cohort, for example 
prop.control <- 0.4 ; prop.int <- 0.4 ; prop.ref <- 1-(prop.int+prop.control)
n.control <- prop.control * s; n.int <- prop.int * s; n.ref <- prop.ref * s 
n.sites <- c(n.control, n.int, n.ref)  

now, n.sites by itself returns

[1] 4 4 2

so when I run my cohort <- rep(1:n.cohorts, n.sites) call again I expect cohort to be a list of 10 items, like this: [1] 1 1 1 1 2 2 2 2 3 3. What i get, however, is only 9:

> cohort
[1] 1 1 1 1 2 2 2 2 3    

If i run the same code where n.sites is defined directly like so: n.sites <- c(4, 4, 2), I get the 10 items i expect. I have redone this several times to convince myself that under both scenarios n.sites by itself produces identical results.

Can anyone explain why this happens? Many thanks in advance.

David

Was it helpful?

Solution

I think this is one of those arithmetical inaccuracy issues in R. The problem is here:

prop.ref <- 1-prop.int-prop.control
prop.ref*10
#[1] 2
floor(prop.ref*10)
#[1] 1

So r thinks that prop.int+prop.control are very slightly bigger than 0.8

You can fix it by

cohort <- rep(1:n.cohorts, ceiling(n.sites)) 

But you're right, it does seem like a SERIOUS bug EDIT - sorry meant SEEM like a serious

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top