Your issue boils down to the fact you are using non-syntatic variable names.
These should be used with caution, and without expectation that package authors will be able to anticipate any issues that may arise.
To quote from the help for formula
Variable names can be quoted by backticks
like this
in formulae, although there is no guarantee that all code using formulae will accept such non-syntactic names.
The issue in how xvars
is created rlm.formula
xvars <- as.character(attr(mt, "variables"))[-1L]
and then the use later on
xlev <- if (length(xvars) > 0L) {
xlev <- lapply(mf[xvars], levels)
xlev[!sapply(xlev, is.null)]
}
Which, as you show, does not work
This will create quoted back-ticked variables for non-syntatic names. If they are already backticked, then they will create double back-ticked names
i.e. if the column name was "x1^2"
, the element in xvar
becomes "`x1^2`"
.
This fails with [.data.frame
for example
x <- data.frame(`a` = 1)
> x[,'`a`']
Error in `[.data.frame`(x, , "`a`") : undefined columns selected
Because the column name is 'a'
not `a`
If you backtick the column name
i.e. if the column name was "`x1^2`"
, the element in xvar
becomes "``x1^2``"
.
which again is not a column in your data.frame
The reason lm
works is that it does not attempt this definition and use of xvars
, instead it uses model.matrix
to define the design matrix x
directly to pass to lm.fit
If you want to fit the model y ~ x1 + x2 + x1:x2 +x1^2 + y1^2
then you can using
rlm(y ~ x1*x2 + I(x1^2) + I(x2^2)
In this case you only need three columns in your data.frame (or objects in your evaluation environment) y
, x1
and x2
. as the I()
function allows to perform arithmetic operations on a variable, as I
is parsed as a symbol by terms.formula