Join data.tables based on unequal timestamp

Question 1

You can use the roll parameter:

setkey(B, status_dt)
B[A, roll=TRUE]

Produces:

              status_dt station_id availability Start.Station.ID
 1: 2014-04-06 21:07:42        225    0.4864865              225
 2: 2014-04-06 21:06:50        225    0.4864865              225
 3: 2014-04-06 21:06:49        225    0.4864865              225
 4: 2014-04-06 21:06:15        225    0.4864865              225
 5: 2014-04-06 21:04:35        225    0.4864865              225
 6: 2014-04-06 21:05:33        225    0.4864865              225
 7: 2014-04-06 21:04:45        225    0.4864865              225
 8: 2014-04-06 21:04:37        225    0.4864865              225
 9: 2014-04-06 21:04:35        225    0.4864865              225
10: 2014-04-06 21:01:45        225    0.4864865              225
11: 2014-04-06 21:00:57        225    0.4864865              225
12: 2014-04-06 20:59:04        225    0.4864865              225
13: 2014-04-06 20:58:04        225    0.8648649              225
14: 2014-04-06 20:57:22        225    0.8648649              225
15: 2014-04-06 20:57:24        225    0.8648649              225
16: 2014-04-06 20:56:40        225    0.8648649              225
17: 2014-04-06 20:55:52        225    0.8648649              225
18: 2014-04-06 20:55:25        225    0.8648649              225
19: 2014-04-06 20:55:24        225    0.8648649              225
20: 2014-04-06 20:55:00        225    0.8648649              225
21: 2014-04-06 18:25:30        225    0.9729730              225
22: 2014-04-06 18:25:28        225    0.9729730              225
              status_dt station_id availability Start.Station.ID

This matches closely to your expected output, except it has some extra rows that as far as I can tell are legitimate per your description of the problem.

Question 2

I mostly use the zoo or xts packages which were essentially written for this.

R> dfA <- as.data.frame(A)
R> a <- xts(dfA[,2], order.by=dfA[,1])
R> dfB <- as.data.frame(B)
R> b <- xts(dfB[,-1], order.by=dfB[,1])

Now that we have two xts object, we can just merge() and run na.locf() over the result to fill NA with prior values:

R> na.locf(merge(a, b))
                      a station_id availability
2014-04-06 17:59:03  NA        225     0.972973
2014-04-06 18:25:28 225        225     0.972973
2014-04-06 18:25:30 225        225     0.972973
2014-04-06 18:59:03 225        225     0.621622
2014-04-06 20:29:03 225        225     0.864865
2014-04-06 20:55:00 225        225     0.864865
2014-04-06 20:55:24 225        225     0.864865
2014-04-06 20:55:25 225        225     0.864865
2014-04-06 20:55:52 225        225     0.864865
2014-04-06 20:56:40 225        225     0.864865
2014-04-06 20:57:22 225        225     0.864865
2014-04-06 20:57:24 225        225     0.864865
2014-04-06 20:58:04 225        225     0.864865
2014-04-06 20:59:02 225        225     0.486486
2014-04-06 20:59:04 225        225     0.486486
2014-04-06 21:00:57 225        225     0.486486
2014-04-06 21:01:45 225        225     0.486486
2014-04-06 21:04:35 225        225     0.486486
2014-04-06 21:04:35 225        225     0.486486
2014-04-06 21:04:37 225        225     0.486486
2014-04-06 21:04:45 225        225     0.486486
2014-04-06 21:05:33 225        225     0.486486
2014-04-06 21:06:15 225        225     0.486486
2014-04-06 21:06:49 225        225     0.486486
2014-04-06 21:06:50 225        225     0.486486
2014-04-06 21:07:42 225        225     0.486486
2014-04-06 21:59:02 225        225     0.162162
2014-04-06 23:29:02 225        225     0.162162
R>

But there ought to be a data.table answer in this too...

Edit: Per the comment, here is merge with just a timestamps:

R> na.locf(merge(a, b))[index(a), -1]
                    station_id availability
2014-04-06 18:25:28        225     0.972973
2014-04-06 18:25:30        225     0.972973
2014-04-06 20:55:00        225     0.864865
2014-04-06 20:55:24        225     0.864865
2014-04-06 20:55:25        225     0.864865
2014-04-06 20:55:52        225     0.864865
2014-04-06 20:56:40        225     0.864865
2014-04-06 20:57:22        225     0.864865
2014-04-06 20:57:24        225     0.864865
2014-04-06 20:58:04        225     0.864865
2014-04-06 20:59:04        225     0.486486
2014-04-06 21:00:57        225     0.486486
2014-04-06 21:01:45        225     0.486486
2014-04-06 21:04:35        225     0.486486
2014-04-06 21:04:35        225     0.486486
2014-04-06 21:04:37        225     0.486486
2014-04-06 21:04:45        225     0.486486
2014-04-06 21:05:33        225     0.486486
2014-04-06 21:06:15        225     0.486486
2014-04-06 21:06:49        225     0.486486
2014-04-06 21:06:50        225     0.486486
2014-04-06 21:07:42        225     0.486486
R>

In this particular case I also removed the redundant station id column.