How to create dataframe subset of the one patient observation with the lowest score on a variable

Question 1

Assuming your data frame is called df, use the ddply function in the plyr package:

require(plyr)
firstObs <- ddply(df, "PatientID", function(x) x[x$pt_visit == min(x$pt_visit), ])

Question 2

I would use the data.table package:

Data <- data.table(Data)
setkey(Data, Patient_ID, pt_visit)
Data[,.SD[1], by=Patient_ID]

Question 3

Assuming that the Patient ID column is actually named Patient_ID, here are a few approaches. DF is assumed to be the name of the input data frame:

sqldf

library(sqldf)

sqldf("select Patient_ID, Tender, Swollen, min(pt_visit) pt_visit 
   from DF 
   group by Patient_ID")

or

sqldf("select *, min(pt_visit) pt_visit from DF group by Patient_ID")[-ncol(DF)]

Note: The above two alternatives use an extension to SQL only found in SQLite so be sure you are using the SQLite backend. (SQLite is the default backend for sqldf unless RH2, RProgreSQL or RMYSQL is loaded.)

subset/ave

subset(DF, ave(pt_visit, Patient_ID, FUN = rank) == 1)

Note: This makes use of the fact that there are no duplicate pt_visit values within the same Patient_ID. If there were we would need to specify the ties= argument to rank.

Question 4

I almost think they should be a subset parameter named "by" that would do the same as it does in data.table. This is a base-solution:

do.call(rbind,  lapply( split(dfr, dfr$PatientID), 
                  function(x) x[which.min(x$pt_visit),] ) )

    PatientID Tender Swollen pt_visit
101       101      1      10        6
102       102      9       5       18
103       103      5       2       12

I guess you can see why @hadley built 'plyr'.