Using lm and predict on data in matrices

Question 1

You can store a whole matrix in one column of a data.frame:

x <- a [, -1]
y <- a [,  1]
data <- data.frame (y = y, x = I (x))
str (data)
## 'data.frame':    10 obs. of  2 variables:
## $ y: num  0.818 0.767 -0.666 0.788 -0.489 ...
## $ x: AsIs [1:10, 1:9] 0.916274.... 0.386565.... 0.703230.... -2.64091.... 0.274617.... ...

model <- lm (y ~ x)
newdata <- data.frame (x = I (b [, -1]))
predict (model, newdata) 
##         1         2 
## -3.795722 -4.778784

The paper about the pls package, (Mevik, B.-H. and Wehrens, R. The pls Package: Principal Component and Partial Least Squares Regression in R Journal of Statistical Software, 2007, 18, 1 - 24.) explains this technique.

Another example with a spectroscopic data set (quinine fluorescence), is in vignette ("flu") of my package hyperSpec.

Question 2

To make data.fram's out of your matrices, simply do:

m = matrix(runif(100), 10, 10)
df = as.data.frame(m)

And perform linear regression:

lm_result = lm(V1 ~ V100, df)
predicted_values = predict(lm_result, b)

Or for multiple regression:

lm_result = lm(V1 ~ V2 + V3 + V4, df)
predicted_values = predict(lm_result, b)

assuming the columns V1 - V4 are present in b.

Question 3

You could compute the predictions manually:

> fit <- lm(a[,1] ~ a[,-1])
> fit$coefficients[1] + b[,-1] %*% fit$coefficients[-1]
     [,1]
[1,]    1
[2,]    2
[3,]    5

Here, fit$coefficients[1] is the intercept, and fit$coefficients[-1] are the remaning coefficients (and %*% is matrix multiplication).

Question 4

I'm using lm inside a function to cycle through many different linear models then perform leave-one-out cross validation for predictions. @PaulHiemstra 's sprintf did the trick.