I have two data.frames and I am using them to create a new variable C
(a standardized distance measure). Each data.frame has the following information (Coordinates, Season, Variables. I am going to calculate C
between df.a
and df.b
for every unique, coordinate-season (i.e. each XX, YY - X,Y pair by season). To this end I have merged the two data.frames (df.new
) to prep for calcualting C
.
Here is how I currently would perform this operation:
# for example, for season = SUM
# V1 and VV1 are the same variable from the different dataframes, SEA = Season,
# X,Y and XX, YY are coordinates
df.new.SUM <- subset(df.new, SEA == "SUM") # Summer
attach(df.new.SUM)
df.new.SUM$C_V1 <- (V1-VV1)^2/sd(V1)^2 # almost wouldn't need to subset except that the denominator here should only be for one season
df.new.SUM$C_V2 <- (V2-VV2)^2/sd(V2)^2
df.new.SUM$C <- sqrt(rowSums(df.new.SUM[,c("C_V1","C_V2")]))
# continue for other seasons and then rbind
However, this seems approach seems clunky. Is there way to calculate C
for each season - coordinate group without subsetting into a data.frame and then rbinding for each season? How can I only use one season without subsetting into a new data.frame? Or, even better, how do I do this for each season in a vectorized way? What r packages should I be exploring?
df.a <- structure(list(XX = c(10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L,
14L, 14L), YY = c(20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 15L,
15L), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM",
"WIN", "SUM", "WIN"), VV1 = c(10.5, 15, 8, 8.5, 8, 7.5, 11, 13,
15, 10), VV2 = c(13, 3, 3.5, 6, 3.5, 3, 5, 4, 5, 5)), .Names = c("XX",
"YY", "SEA", "VV1", "VV2"), row.names = c(NA, -10L), class = "data.frame")
#
df.b <- structure(list(X = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Y = c(1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), SEA = c("SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN",
"SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM", "WIN", "SUM",
"WIN", "SUM", "WIN"), V1 = c(10, 12, 10, 9.5, 10, 14.5, 10.5,
13, 11.5, 14, 12.5, 8.5, 10, 7.5, 11, 7, 11, 8, 11, 14.5), V2 = c(3.5,
3, 3.5, 2.5, 3, 5, 5.5, 4, 2, 2.5, 3.5, 2, 3.5, 4.5, 5.5, 3.5,
5, 6, 6, 5)), .Names = c("X", "Y", "SEA", "V1", "V2"), row.names = c(NA,
-20L), class = "data.frame")
#
df.new <- merge(df.a, df.b, by = c("SEA"), all.x = TRUE, allow.cartesian=TRUE)
#
# EDIT ## solution based on suggestions below
df.out <- data.frame()
seasons <- unique(df.new$SEA)
for (s in seasons){
data <- subset(df.new, SEA == s)
data$C <- sqrt(with(data, (V1-VV1)^2/sd(V1)^2 +(V2-VV2)^2/sd(V2)^2 ))
df.out <- rbind(df.out,data)
}