Domanda

In a data frame (in R), I have two columns - the first is a list of species names (species), the second is the number of occurrence records I have for that species (number). There is a large variation in the number column with most values being <100 but a few being very high values (>100,000), and there are many rows (~4000). Here is a simplified example:

    x<-data.frame(species=c("a","b","c","d","e","f","g","h","i","j"),number=c(53,17,67,989,135,67,13,786,100400,28))   

Basically what I want to do is reduce the maximum number of records (the value in the number column) until the mean of all the values in this column stabilises.

To do this, I need to set a maximum limit for values in the number column so that any value > this limit is reduced to this maximum limit, and record the mean. I want to repeat this multiple times, each time reducing the maximum limit by 100.

I've not been able to find any similar questions online and am not really sure where to start with this! Any help, even just a point in the right direction, would be much appreciated! Cheers

È stato utile?

Soluzione

you should use the pmin value :

pmin(x$number, 1e3)
# to test multiple limits :
mns <- sapply(c(1e6, 1e4, 1e2), function(u) mean(pmin(x$number, u)))
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top