Question

I have a data.frame data:

Sector Var1 Var2 Var3
Abcd    1    1    1
Efgh    4    5    6
Ijkl    7    8    9

I would like to create a new data.frame that consists of the difference between each unique pairing of Sector for each variable. An example of the desired result is below:

# result
Sector1 Sector2 Dif_Var1 Dif_Var2 Dif_Var3
Abcd    Efgh        3        4        5
Abcd    Ijkl        6        7        8
Efgh    Ijkl        3        3        3

I can find each unique pair (code below) for Sector but am unsure if this is the way to proceed. Is this minimal pseudo code below seem like an appropriate and potentially successful route?

# pseudo code
result <- which(Sector1 == Sector) get Var1,2,3 values - which(Sector2 == Sector) get Var1,2,3 values

Is there an r function or package that would facilitate arriving at the desired result?

# unique pairings
# http://stackoverflow.com/q/23024059/1670053
df <- structure(list(Sector = structure(1:3, .Label = c("Abcd", "Efgh", 
"Ijkl"), class = "factor"), Var1 = c(1L, 4L, 7L), Var2 = c(1L, 
5L, 8L), Var3 = c(1L, 6L, 9L)), .Names = c("Sector", "Var1", 
"Var2", "Var3"), class = "data.frame", row.names = c(NA, -3L))
df.u <- expand.grid(df$Sector,df$Sector)
df.u2 <- as.data.frame(unique(t(apply(df.u, 1, function(x) sort(x)))))
data <- subset(df.u2, ! df.u2$V1 == df.u2$V2)
Was it helpful?

Solution

Using the sqldf package:

library(sqldf)
sqldf("select A.Sector Sector1,
              B.Sector Sector2,
              B.Var1 - A.Var1 Var1, 
              B.Var2 - A.Var2 Var2, 
              B.Var3 - A.Var3 Var3
              from df A join df B
              on A.Sector < B.Sector")

giving:

  Sector1 Sector2 Var1 Var2 Var3
1    Abcd    Efgh    3    4    5
2    Abcd    Ijkl    6    7    8
3    Efgh    Ijkl    3    3    3

This could also be written:

 nms <- names(df)[-1]
 sel <- toString( sprintf('B.%s - A.%s %s', nms, nms, nms) )
 fn$sqldf("select A.Sector Sector1, B.Sector Sector2, $sel 
           from df A join df B 
           on A.Sector < B.Sector")

REVISED Fixed and added variation.

OTHER TIPS

You can just use the V1 and V2 vectors that you created to index df

df[data$V2, -1] - df[data$V1, -1]
#    Var1 Var2 Var3
#2      3    4    5
#3      6    7    8
#3.1    3    3    3

keeping the names

cbind(data$V1, data$V2, df[data$V2, -1] - df[data$V1, -1])
#    data$V1 data$V2 Var1 Var2 Var3
#2      Abcd    Efgh    3    4    5
#3      Abcd    Ijkl    6    7    8
#3.1    Efgh    Ijkl    3    3    3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top