select unique combinations of some columns in R, and random value for another column

Question 1

I figured out a fast and simple solution.

First, randomly permute the rows:

myD <- myD[sample(1:dim(myD)[1],replace=FALSE),]

Next, keep only the first row for each unique combination of x and y:

myD <- myD[!duplicated(myD[,c("x","y")]),]

Question 2

I have not built data to test this on, but I have found dplyr to be faster than plyr, so this command:

library(dplyr)

df_sampled <- myD %.%
group_by(x, y) %.% 
summarize(a = a[1], b = b[1])

Ought to give you better performance.

Question 3

Since speed is important here I would suggest a combination of the data.table package and the sample function. data.table can do many of the same things plyr can do but much much faster. Something like this might work...

#Make fake data
set.seed(3)
myD <- data.frame(x=c("s","s","s","t","t","t"),y=c("u","u","v","v","w","w"),
    a=rnorm(6),b=rnorm(6))

#See data
myD
# x y           a           b
# 1 s u -0.96193342  0.08541773
# 2 s u -0.29252572  1.11661021
# 3 s v  0.25878822 -1.21885742
# 4 t v -1.15213189  1.26736872
# 5 t w  0.19578283 -0.74478160
# 6 t w  0.03012394 -1.13121857

require("data.table")

myD <- data.table(myD)
myD[,rand.row:=sample(1:.N,1),by=c("x","y")]
myD <- myD[,list(a=a[rand.row],b=b[rand.row]),by=c("x","y","rand.row")]
myD

#   x y  rand.row       a           b
# 1: s u        1 -0.96193342  0.08541773
# 2: s v        1  0.25878822 -1.21885742
# 3: t v        1 -1.15213189  1.26736872
# 4: t w        2  0.03012394 -1.13121857