i 早期发布于今天关于一个错误我正在使用predict函数。我能够得到那种纠正,以为我在正确的道路上。

我有许多观察(实际),我有一些我想要推断或预测的数据点。我使用的是生成的icetagcode来创建模型,然后我尝试使用生成的实际值,该值将作为预测器输入。

这段代码全部来自我以前的帖子,但这里是:

df <- read.table(text = '
     Quarter Coupon      Total
1   "Dec 06"  25027.072  132450574
2   "Dec 07"  76386.820  194154767
3   "Dec 08"  79622.147  221571135
4   "Dec 09"  74114.416  205880072
5   "Dec 10"  70993.058  188666980
6   "Jun 06"  12048.162  139137919
7   "Jun 07"  46889.369  165276325
8   "Jun 08"  84732.537  207074374
9   "Jun 09"  83240.084  221945162
10  "Jun 10"  81970.143  236954249
11  "Mar 06"   3451.248  116811392
12  "Mar 07"  34201.197  155190418
13  "Mar 08"  73232.900  212492488
14  "Mar 09"  70644.948  203663201
15  "Mar 10"  72314.945  203427892
16  "Mar 11"  88708.663  214061240
17  "Sep 06"  15027.252  121285335
18  "Sep 07"  60228.793  195428991
19  "Sep 08"  85507.062  257651399
20  "Sep 09"  77763.365  215048147
21  "Sep 10"  62259.691  168862119', header=TRUE)

str(df)
'data.frame':   21 obs. of  3 variables:
 $ Quarter   : Factor w/ 24 levels "Dec 06","Dec 07",..: 1 2 3 4 5 7 8 9 10 11 ...
 $ Coupon: num  25027 76387 79622 74114 70993 ...
 $ Total: num  132450574 194154767 221571135 205880072 188666980 ...
.

代码:

model <- lm(df$Total ~ df$Coupon, data=df)

> model

Call:
lm(formula = df$Total ~ df$Coupon)

Coefficients:
(Intercept)    df$Coupon  
  107286259         1349 
.

预测代码(基于以前的帮助):

(这些是我要使用的预测值来获取预测值)

Quarter = c("Jun 11", "Sep 11", "Dec 11")
Total = c(79037022, 83100656, 104299800)
Coupon = data.frame(Quarter, Total)

Coupon$estimate <- predict(model, newdate = Coupon$Total)
.

现在,当我运行时,我收到此错误消息:

Error in `$<-.data.frame`(`*tmp*`, "estimate", value = c(60980.3823396919,  : 
  replacement has 21 rows, data has 3
.

我用于构建模型的原始数据帧有21个观察。我现在正在尝试根据模型预测3个值。

我要么不真正理解这个函数,或者在我的代码中有错误。

帮助将受到赞赏。

感谢

有帮助吗?

解决方案

First, you want to use

model <- lm(Total ~ Coupon, data=df)

not model <-lm(df$Total ~ df$Coupon, data=df).

Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.

Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.

Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:

model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)

其他提示

Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.

This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area@olive$Palmitic) - ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2) can't then be used. If you use lm(Area@Palmitic,data=olive) then the variable names are right and prediction works.

The real problem is that the error message does not indicate the problem at all:

Warning message: 'anewdata' had 1 rows but variable(s) found to have X rows

instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon) It will work.

To avoid error, an important point about the new dataset is the name of independent variable. It must be the same as reported in the model. Another way is to nest the two function without creating a new dataset

model <- lm(Coupon ~ Total, data=df)
predict(model, data.frame(Total=c(79037022, 83100656, 104299800)))

Pay attention on the model. The next two commands are similar, but for predict function, the first work the second don't work.

model <- lm(Coupon ~ Total, data=df) #Ok
model <- lm(df$Coupon ~ df$Total) #Ko
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top