Frage

I have these dates:

library(lubridate)
set.seed(50)
myDates <- ymd("2013-07-12") + days(sample(1:100, 20))
df <- data.frame(date=as.Date(myDates), value=sample(1:100, 20))
df[sample(1:20, 5, replace=F), "value"] <- NA

         date value
1  2013-09-21    NA
2  2013-08-25    11
3  2013-08-01    NA
4  2013-09-25    96
5  2013-08-31    55
6  2013-07-17    27
7  2013-09-16    99
8  2013-09-11    66
9  2013-07-16    89
10 2013-07-22    37
11 2013-08-17    NA
12 2013-08-06    56
13 2013-09-07    NA
14 2013-07-19    39
15 2013-08-05    NA
16 2013-09-08    17
17 2013-10-20    54
18 2013-08-12    23
19 2013-10-07    71
20 2013-07-26    98

I want to make a function that splits the above date range, and any other date range, into 4 parts. The 4 parts should be the 1st, 2nd, 3rd and 4th quartiles of the date range. Therefore, the function needs to find the earliest date and latest date, then assign each element of the value to a quartile. The date range in the above code is this:

range(df$date[!is.na(df$date)])
[1] "2013-07-16" "2013-10-20"

I then need the function to find the number of NA values in each quartile. Can this be done?

War es hilfreich?

Lösung

Here is a suggestion:

# Create data
library(lubridate)
set.seed(50)
myDates <- ymd("2013-07-12") + days(sample(1:100, 20))
df <- data.frame(date=as.Date(myDates), value=sample(1:100, 20))
df[sample(1:20, 5, replace=F), "value"] <- NA

#          date value
# 1  2013-09-21    NA
# 2  2013-08-25    NA
# 3  2013-08-01    70
# 4  2013-09-25    82
# 5  2013-08-31    30
# 6  2013-07-17    NA
# 7  2013-09-16    55
# 8  2013-09-11    NA
# 9  2013-07-16    96
# 10 2013-07-22    34
# 11 2013-08-17    33
# 12 2013-08-06    37
# 13 2013-09-07    39
# 14 2013-07-19    54
# 15 2013-08-05    99
# 16 2013-09-08    NA
# 17 2013-10-20    11
# 18 2013-08-12    59
# 19 2013-10-07    31
# 20 2013-07-26    38

# Proposed solution
myQtle   <- quantile(as.POSIXlt(df$date), probs = 0.25 * 1:4)
myCumVal <- sapply(myQtle,
                   function(qtle, theDates, theValues){
                       sum(is.na(theValues[theDates <= qtle]))},
                   theDates  = as.POSIXlt(df$date),
                   theValues = df$value)

data.frame(qtle  = as.Date(myQtle),
           nb.na = c(myCumVal[1], diff(myCumVal)))

#            qtle nb.na
# 25%  2013-07-30     1
# 50%  2013-08-21     0
# 75%  2013-09-12     3
# 100% 2013-10-20     1

Andere Tipps

I believe the following sequence should help you wat least with part of the problem (sorry for the clumsiness):

df <- df[order(df[, 1] ), ]  # sort by date
df$order <- seq(1:nrow(df))  # assignment of order
quartSize <- nrow(df)/4  # size of quartiles
breakPts <- seq(1, nrow(df), quartSize)  # break points
quant <- rep(0, nrow(df))
for (i in 1:nrow(df))
  quant[i] <- ifelse(df[i, 3] < breakPts[2], 1,
                     ifelse(df[i, 3] < breakPts[3], 2,
                            ifelse(df[i, 3] < breakPts[4], 3, 4)
                     )
  )
df <- cbind(df, quant)

If you then run table(df$quant, is.na(df[, 2]))[, 2], you'll get a tally of NAs on each quartile.

The earliest date will be df[1, ]; the latest, df[nrow(df), ].

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top