문제

In a data frame (in R), I have two columns - the first is a list of species names (species), the second is the number of occurrence records I have for that species (number). There is a large variation in the number column with most values being <100 but a few being very high values (>100,000), and there are many rows (~4000). Here is a simplified example:

    x<-data.frame(species=c("a","b","c","d","e","f","g","h","i","j"),number=c(53,17,67,989,135,67,13,786,100400,28))   

Basically what I want to do is reduce the maximum number of records (the value in the number column) until the mean of all the values in this column stabilises.

To do this, I need to set a maximum limit for values in the number column so that any value > this limit is reduced to this maximum limit, and record the mean. I want to repeat this multiple times, each time reducing the maximum limit by 100.

I've not been able to find any similar questions online and am not really sure where to start with this! Any help, even just a point in the right direction, would be much appreciated! Cheers

도움이 되었습니까?

해결책

you should use the pmin value :

pmin(x$number, 1e3)
# to test multiple limits :
mns <- sapply(c(1e6, 1e4, 1e2), function(u) mean(pmin(x$number, u)))
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top