Here's a simple example which seems to reproduce your problem:
> library(survival)
> (df1 <- data.frame(t1=seq(1:6),
s1=rep(c(0, 1), 3),
te1=c(rep(0, 3), rep(1, 3)),
pa1=c(0,0,1,0,0,0)
))
t1 s1 te1 pa1
1 1 0 0 0
2 2 1 0 0
3 3 0 0 1
4 4 1 1 0
5 5 0 1 0
6 6 1 1 0
> (coxph(Surv(t1, s1) ~ te1*pa1, data=df1))
Call:
coxph(formula = Surv(t1, s1) ~ te1 * pa1, data = df1)
coef exp(coef) se(coef) z p
te1 -23 9.84e-11 58208 -0.000396 1
pa1 -23 9.84e-11 100819 -0.000229 1
te1:pa1 NA NA 0 NA NA
Now lets look for 'perfect classification' like so:
> (xtabs( ~ s1+te1, data=df1))
te1
s1 0 1
0 2 1
1 1 2
> (xtabs( ~ s1+pa1, data=df1))
pa1
s1 0 1
0 2 1
1 3 0
Note that a value of 1
for pa1
exactly predicts having a status s1
equal to 0
. That is to say, based on your data, if you know that pa1==1
then you can be sure than s1==0
. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors.
This can be seen with
> coxph(Surv(t1, s1) ~ pa1, data=df1)
giving
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1 ; beta may be infinite.
It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.
If we add the interaction term to df1
manually like this:
> (df1 <- within(df1,
+ te1pa1 <- te1*pa1))
t1 s1 te1 pa1 te1pa1
1 1 0 0 0 0
2 2 1 0 0 0
3 3 0 0 1 0
4 4 1 1 0 0
5 5 0 1 0 0
6 6 1 1 0 0
Then check it with
> (xtabs( ~ s1+te1pa1, data=df1))
te1pa1
s1 0
0 3
1 3
We can see that it's a useless classifier, i.e. it does not help predict status s1
.
When combining all 3 terms, the fitter does manage to produce a numerical value for te1
and pe1
even though pe1
is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.
Edit @JMarcelino: If you look at the warning message from the first coxph
model in the example, you'll see the warning message:
2: In coxph(Surv(t1, s1) ~ te1 * pa1, data = df1) :
X matrix deemed to be singular; variable 3
Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table xtabs(~ tecnologia+pais, data=dados)
is not as important as the table of status
by interaction term
. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:
> with(df1,
table(s1, pa1te1=pa1*te1))
pa1te1
s1 0
0 3
1 3
That said, I notice one of the cells in your third table has a zero (conv
, PT
) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.
In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.
Edit 2 @user75782131 Yes, generally speaking xtabs
or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.