`spread= a - b` calculation on the tall data with `dplyr` and `data.table`

Question 1

You ought to be able to work with this:

> merge(DT["a", ], DT["b",], by="Date")
         Date Variable.x Value.x Variable.y Value.y
1: 2014-04-02          a       2          b       3
2: 2014-04-03          a       3          b       2

The help page for merge.data.table suggests you read FAQ 1.12 for a detailed comparison of this with X[Y,...] approaches.

Question 2

Here's one approach with dplyr. First we create the data:

require(dplyr)

df <- data.frame(
  Variable = rep(c("a", "b"), each = 3), 
  Date = rep(as.Date("2014-04-01") + 0:2, 2),
  Value = c(1:3, 3:1)
)

Instead of rotating into a wide form, we could instead use a vectorised comparison:

df %.% 
  group_by(Date) %.%
  summarise(spread = Value[Variable == "a"] - Value[Variable == "b"])

## Source: local data frame [3 x 2]
## 
##         Date spread
## 1 2014-04-01     -2
## 2 2014-04-02      0
## 3 2014-04-03      2

This will correctly fail if there are multiple values of a or b, because summarise() requires that results are of length one. The same approach would work with data.table, but you'd need to be a little more careful about checking the results (because data table is less strict/more flexible here compared to dplyr).

You could also use the join approach suggested by BondedDust. It's not as quite as convenient with dplyr as it is with data.table:

a <- df %.% filter(Variable == "a") %.% select(-Variable)
b <- df %.% filter(Variable == "b") %.% select(-Variable)

inner_join(a, b, by = "Date") %.%
  mutate(spread = Value.x - Value.y)

##         Date Value.x Value.y spread
## 1 2014-04-01       1       3     -2
## 2 2014-04-02       2       2      0
## 3 2014-04-03       3       1      2

Question 3

Here's a data.table method by using dcast.data.table.

I hope I provide an useful start, and there're some following issues about selecting NAs, and speed gain.

# Create Dataset
require(data.table)
require(reshape2)
DT <- data.table(Variable=c(rep("a",times = 3), rep("b", times=3)), 
             Date=as.Date(c("2014-04-01","2014-04-02","2014-04-03"
                            ,"2014-04-02", "2014-04-03","2014-04-04")),
             Value=c(1:3,3:1), key=c("Variable","Date"))

# using data.table
DT2 <- dcast.data.table(DT, Date ~ Variable, drop=FALSE) 
DT2[, spread:= a-b, by = Date][!is.na(spread),]
# Actually I'm not clear about the different between `drop= FALSE` and `drop = TRUE`

This is the output

         Date a b spread
1: 2014-04-02 2 3     -1
2: 2014-04-03 3 2      1