optimal predictor value for multivariate regression in R

Question 1

So here is an empirical way to do this:

# create some random data...
set.seed(1)
X1 <- 1:100
X2 <- sin(2*pi/100*(1:100))
df <- data.frame(Y=3 + 5*X1 -0.2 * X1^2 + 100*X2 + rnorm(100,0,5),X1,X2)
fit <- lm(Y ~ poly(X1,2,raw=T) + X2, data=df)
# X1 and X2 unconstrained
df$pred <- predict(fit)
result  <- with(df,df[pred==max(pred),])
result
#           Y X1        X2     pred
# 19 122.8838 19 0.9297765 119.2087

# max(Y|X2=0)
newdf       <- data.frame(Y=df$Y, X1=df$X1, X2=0)
newdf$pred2 <- predict(fit,newdata=newdf)
result2     <- with(newdf,newdf[pred2==max(pred2),])
result2
#           Y X1 X2    pred2
#12 104.6039 12  0 35.09141

So in this example, when X1 and X2 are unconstrained, the maximum value of Y = 119.2 and occurs at (X1,X2) = (122.8,0.930). When X2 is constrained to 0, the maximum value of Y = 35.1 and occurs at (X1,X2) = (104.6,0).

There are a couple of things to consider:

These are global maxima in the space of your data. In other words if your real data has a large number of variables there might be local maxima that you will not find this way.
This method has resolution only as great as your dataset. So if the true maximum occurs at a point between your data points, you will not find it this way.
This technique is restricted to the bounds of your dataset. So if the true maximum is outside those bounds, you will not find it. On the other hand, using a model outside the bounds of your data is, IMHO, the definition of reckless.

Finally, you should be aware the poly(...) produces orthogonal polynomials which will generate a fit, but the coefficients will be very difficult to interpret. If you really want a quadratic fit, e.g. a+ b × x+ c × x², you are better off doing that explicitly with Y~X1 +I(X1^2)+X2, or using raw=T in the call to poly(...).

Question 2

credit to @sashkello

Basically, I have to extract coefficients from lm object and multiply with corresponding terms to form the formula to proceed.

I think this is not very efficient. What if this is regression with hundreds of predictors?