why does rep() behave inconsistently with this simple R example?

https://stackoverflow.com/questions/21398586

03-10-2022
|

Question

I am building code to run and manage simulations of sampling events at sites that can be in one of three site cohorts. I use rep() to assign the cohort identifier (1,2, or 3) using the code below:

cohort <- rep(1:n.cohorts, n.sites)

I have put the key line first, although to reproduce my problem you need to run the following lines, which allocate a total number of sites between cohorts for presentation to the rep() call.

n.cohorts <- 3
s <- 10 # total available sites in this example

# different proportions of the total can be allocated to each cohort, for example 
prop.control <- 0.4 ; prop.int <- 0.4 ; prop.ref <- 1-(prop.int+prop.control)
n.control <- prop.control * s; n.int <- prop.int * s; n.ref <- prop.ref * s 
n.sites <- c(n.control, n.int, n.ref)

now, n.sites by itself returns

[1] 4 4 2

so when I run my cohort <- rep(1:n.cohorts, n.sites) call again I expect cohort to be a list of 10 items, like this: [1] 1 1 1 1 2 2 2 2 3 3. What i get, however, is only 9:

> cohort
[1] 1 1 1 1 2 2 2 2 3

If i run the same code where n.sites is defined directly like so: n.sites <- c(4, 4, 2), I get the 10 items i expect. I have redone this several times to convince myself that under both scenarios n.sites by itself produces identical results.

Can anyone explain why this happens? Many thanks in advance.

David

Solution

I think this is one of those arithmetical inaccuracy issues in R. The problem is here:

prop.ref <- 1-prop.int-prop.control
prop.ref*10
#[1] 2
floor(prop.ref*10)
#[1] 1

So r thinks that prop.int+prop.control are very slightly bigger than 0.8

You can fix it by

cohort <- rep(1:n.cohorts, ceiling(n.sites))

But you're right, it does seem like a SERIOUS bug EDIT - sorry meant SEEM like a serious

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow