using caret package to find optimal parameters of GBM

Question 1

This link has a concrete example (page 10) - http://www.jstatsoft.org/v28/i05/paper

Basically, one should first create a grid of candidate values for hyper parameters (like n.trees, interaction.depth and shrinkage). Then call the generic train function as usual.

Question 2

Not sure if you found what you were looking for, but I find some of these sheets less than helpful.

If you are using the caret package, the following describes the required parameters: > getModelInfo()$gbm$parameters

He are some rules of thumb for running GBM:

The interaction.depth is 1, and on most data sets that seems adequate, but on a few I have found that testing the results against odd multiples up the max has given better results. The max value I have seen for this parameter is floor(sqrt(NCOL(training))).
Shrinkage: the smaller the number, the better the predictive value, the more trees required, and the more computational cost. Testing the values on a small subset of data with something like shrinkage = shrinkage = seq(.0005, .05,.0005) can be helpful in defining the ideal value.
n.minobsinnode: default is 10, and generally I don't mess with that. I have tried c(5,10,15,20) on small sets of data, and didn't really see an adequate return for computational cost.
n.trees: the smaller the shrinkage, the more trees you should have. Start with n.trees = (0:50)*50 and adjust accordingly.

Example setup using the caret package:

getModelInfo()$gbm$parameters
library(parallel)
library(doMC)
registerDoMC(cores = 20)
# Max shrinkage for gbm
nl = nrow(training)
max(0.01, 0.1*min(1, nl/10000))
# Max Value for interaction.depth
floor(sqrt(NCOL(training)))
gbmGrid <-  expand.grid(interaction.depth = c(1, 3, 6, 9, 10),
                    n.trees = (0:50)*50, 
                    shrinkage = seq(.0005, .05,.0005),
                    n.minobsinnode = 10) # you can also put something        like c(5, 10, 15, 20)

fitControl <- trainControl(method = "repeatedcv",
                       repeats = 5,
                       preProcOptions = list(thresh = 0.95),
                       ## Estimate class probabilities
                       classProbs = TRUE,
                       ## Evaluate performance using
                       ## the following function
                       summaryFunction = twoClassSummary)

# Method + Date + distribution
set.seed(1)
system.time(GBM0604ada <- train(Outcome ~ ., data = training,
            distribution = "adaboost",
            method = "gbm", bag.fraction = 0.5,
            nTrain = round(nrow(training) *.75),
            trControl = fitControl,
            verbose = TRUE,
            tuneGrid = gbmGrid,
            ## Specify which metric to optimize
            metric = "ROC"))

Things can change depending on your data (like distribution), but I have found the key being to play with gbmgrid until you get the outcome you are looking for. The settings as they are now would take a long time to run, so modify as your machine, and time will allow. To give you a ballpark of computation, I run on a Mac PRO 12 core with 64GB of ram.