Question

I have a data.frame where I want to create a new variable based on two conditions. (1) The new variable is a pre-defined string that corresponds to an existing range of values for variable MONTH in the data.frame and (2) these strings are dependent on the positive or negative status of variable X in the data.frame.

Right now, I create subsets for positive and negative X values and then do the following:

month.neg <- subset(month,X < 0)
month.pos <- subset(month,X > 0)

month.pos$SEA[month.pos$MONTH == 12 | month.pos$MONTH == 1 | month.pos$MONTH == 2] <- "Winter"
month.pos$SEA[month.pos$MONTH == 3 | month.pos$MONTH == 4 | month.pos$MONTH == 5] <- "Spring"
month.pos$SEA[month.pos$MONTH == 6 | month.pos$MONTH == 7 | month.pos$MONTH == 8] <- "Summer"
month.pos$SEA[month.pos$MONTH == 9 | month.pos$MONTH == 10 | month.pos$MONTH == 11] <- "Fall"

month.neg$SEA[month.neg$MONTH == 12 | month.neg$MONTH == 1 | month.neg$MONTH == 2] <- "Summer"
month.neg$SEA[month.neg$MONTH == 3 | month.neg$MONTH == 4 | month.neg$MONTH == 5] <- "Fall"
month.neg$SEA[month.neg$MONTH == 6 | month.neg$MONTH == 7 | month.neg$MONTH == 8] <- "Winter"
month.neg$SEA[month.neg$MONTH == 9 | month.neg$MONTH == 10 | month.neg$MONTH == 11] <- "Spring"

month.new <- rbind(month.neg, month.pos)

I was considering do something like if(month$X > 0) but this doesn't work on a data.frame (i.e. error: the condition has length > 1 and only the first element will be used).

While this approach above works, it seems verbose. Is there a less verbose approach to this question? What package or procedure in r should I consider?

month <- structure(list(MONTH = c(1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L, 
6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 10L, 10L, 10L, 11L, 
11L, 11L, 11L, 12L, 12L), X = c(-0.25, -0.75, -0.25, 0.25, -0.75, 
0.25, -0.75, 0.25, -0.75, -0.25, 0.25, -0.75, 0.25, -0.75, -0.25, 
-0.75, -0.25, 0.25, 0.75, -0.25, -0.25, -0.75, 0.25, 0.25, 0.75, 
-0.25, -0.25, -0.75, -0.25)), .Names = c("MONTH", "X"), class = "data.frame", row.names = c(NA, 
-29L))
Was it helpful?

Solution

That's because ifelse is vectorized whereas if is not. You want something like:

month_map <- rep(c("Winter", "Spring", "Summer", "Fall"), each = 3)
month_map <- c(month_map[-1], month_map[1]) # tag December to be Winter

month.pos$SEA <- month_map[1 + (month.pos$MONTH - 1 + ifelse(month$X < 0, 6, 0)) %% 12]

The statement ifelse(month$X < 0, 6, 0) will add 6 months if X is negative (i.e. you are in the southern hemisphere), which corresponds to your taxonomy above. The %% 12 trick will roll things like 13 and 14 back to 1 and 2 because of adding the ifelse.

Example

month.pos <- data.frame(MONTH = round(runif(100, 1, 12)))
month <- data.frame(X = runif(100, -1, 1))
head(cbind(month.pos, month), 10)
 #    MONTH           X
 # 1      8 -0.55105406
 # 2      3  0.97186211
 # 3      9 -0.99687710
 # 4      6 -0.92899175
 # 5      7 -0.61108006
 # 6     10  0.66565870
 # 7      4  0.77975565
 # 8     10 -0.54498417
 # 9      7 -0.04759831
 # 10    10 -0.26378151

 month.pos$SEA <- month_map[1 + (month.pos$MONTH - 1 + ifelse(month$X < 0, 6, 0)) %% 12]
 head(month.pos, 10)
 #    MONTH    SEA
 # 1      8 Winter
 # 2      3 Spring
 # 3      9 Spring
 # 4      6 Winter
 # 5      7 Winter
 # 6     10   Fall
 # 7      4 Spring
 # 8     10 Spring
 # 9      7 Winter
 # 10    10 Spring
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top