Calculate differences between unique pairings of data.frame variables
-
22-12-2019 - |
Question
I have a data.frame data
:
Sector Var1 Var2 Var3
Abcd 1 1 1
Efgh 4 5 6
Ijkl 7 8 9
I would like to create a new data.frame that consists of the difference between each unique pairing of Sector
for each variable. An example of the desired result is below:
# result
Sector1 Sector2 Dif_Var1 Dif_Var2 Dif_Var3
Abcd Efgh 3 4 5
Abcd Ijkl 6 7 8
Efgh Ijkl 3 3 3
I can find each unique pair (code below) for Sector
but am unsure if this is the way to proceed. Is this minimal pseudo code below seem like an appropriate and potentially successful route?
# pseudo code
result <- which(Sector1 == Sector) get Var1,2,3 values - which(Sector2 == Sector) get Var1,2,3 values
Is there an r function or package that would facilitate arriving at the desired result?
# unique pairings
# http://stackoverflow.com/q/23024059/1670053
df <- structure(list(Sector = structure(1:3, .Label = c("Abcd", "Efgh",
"Ijkl"), class = "factor"), Var1 = c(1L, 4L, 7L), Var2 = c(1L,
5L, 8L), Var3 = c(1L, 6L, 9L)), .Names = c("Sector", "Var1",
"Var2", "Var3"), class = "data.frame", row.names = c(NA, -3L))
df.u <- expand.grid(df$Sector,df$Sector)
df.u2 <- as.data.frame(unique(t(apply(df.u, 1, function(x) sort(x)))))
data <- subset(df.u2, ! df.u2$V1 == df.u2$V2)
Solution
Using the sqldf package:
library(sqldf)
sqldf("select A.Sector Sector1,
B.Sector Sector2,
B.Var1 - A.Var1 Var1,
B.Var2 - A.Var2 Var2,
B.Var3 - A.Var3 Var3
from df A join df B
on A.Sector < B.Sector")
giving:
Sector1 Sector2 Var1 Var2 Var3
1 Abcd Efgh 3 4 5
2 Abcd Ijkl 6 7 8
3 Efgh Ijkl 3 3 3
This could also be written:
nms <- names(df)[-1]
sel <- toString( sprintf('B.%s - A.%s %s', nms, nms, nms) )
fn$sqldf("select A.Sector Sector1, B.Sector Sector2, $sel
from df A join df B
on A.Sector < B.Sector")
REVISED Fixed and added variation.
OTHER TIPS
You can just use the V1
and V2
vectors that you created to index df
df[data$V2, -1] - df[data$V1, -1]
# Var1 Var2 Var3
#2 3 4 5
#3 6 7 8
#3.1 3 3 3
keeping the names
cbind(data$V1, data$V2, df[data$V2, -1] - df[data$V1, -1])
# data$V1 data$V2 Var1 Var2 Var3
#2 Abcd Efgh 3 4 5
#3 Abcd Ijkl 6 7 8
#3.1 Efgh Ijkl 3 3 3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow