Edit: I got bit by a subtle data.table
behavior. data.table
keeps keys on summarized data, but only the ones you summarized on. So the join wasn't doing what I thought it was doing. Here is the exact same logic, but with one interim step to unset the partial key on the grouped data:
# data generated with `set.seed(1)`
library(data.table)
dt <- data.table(x, y, z)[!is.na(x)]
setkey(dt, y, x) # among other things, this sorts `dt` by `x` and `y` quickly
sub.dt <- dt[, list(x=x[[1]]), by=y][, list(y, x)] # get low X for each Y, and reorder cols to match key
setkey(sub.dt, NULL) # need to remove key as otherwise would join only on `y`
dt[sub.dt, paste(x, y, z, sep="_")] # now join
Produces:
y x V1
1: A 1 1_A_313
2: B 2 2_B_782
3: B 2 2_B_6008
4: B 2 2_B_7230
5: C 2 2_C_2993
6: D 2 2_D_4762
7: E 2 2_E_239
8: E 2 2_E_4581
9: F 3 3_F_4114
10: F 3 3_F_4712
...
41: S 2 2_S_3113
42: S 2 2_S_7949
43: T 2 2_T_4570
44: U 1 1_U_671
45: V 2 2_V_178
46: W 2 2_W_1817
47: W 2 2_W_2233
48: X 1 1_X_648
49: Y 2 2_Y_857
50: Y 2 2_Y_7227
51: Z 3 3_Z_6526
y x V1
Edit2: a cleaner version kindly contributed by Arun in the comments:
dt[dt[, .I[x==min(x)], by=y][, V1]]