R: GLMNET odd behavior when model is reran

https://stackoverflow.com/questions/18467173

26-06-2022
|

Pergunta

I am trying to use LASSO for variable selection, and attempted the implementation in R using the glmnet package. This is the code I wrote so far:

 set.seed(1)
 library(glmnet)
 return =  matrix(ret.ff.zoo[which(index(ret.ff.zoo) == beta.df$date[1]),])
 data = matrix(unlist(beta.df[which(beta.df$date == beta.df$date[1]),][,-1]), ncol = num.factors)
 dimnames(data)[[2]] <- names(beta.df)[-1]
 model <- cv.glmnet(data, return, standardize = TRUE)
 coef(model)

This is what I obtain when I run it the first time:

 > coef(model)
 15 x 1 sparse Matrix of class "dgCMatrix"
                       1
 (Intercept) 0.009159452
 VAL         .          
 EQ          .          
 EFF         .          
 SIZE        0.018479078
 MOM         .          
 FSCR        .          
 MSCR        .          
 SY          .          
 URP         .          
 UMP         .          
 UNIF        .          
 OIL         .          
 DEI         .          
 PROD        .

BUT, this is what I obtain when I run the SAME code once more:

 > coef(model)
 15 x 1 sparse Matrix of class "dgCMatrix"
                       1
 (Intercept) 0.008031915
 VAL         .          
 EQ          .          
 EFF         .          
 SIZE        0.021250778
 MOM         .          
 FSCR        .          
 MSCR        .          
 SY          .          
 URP         .          
 UMP         .          
 UNIF        .          
 OIL         .          
 DEI         .          
 PROD        .

I am not sure why the model behaves this way. How would I be able to choose a final model if the coefficients change at every run? Does it use a different tuning parameter $\lambda$ at every run? I thought that cv.glmnet uses model$lambda.1se by default?!

I have just started learning about this package, and would appreciate any help I can get!

Thank you!

Solução

The model isn't deterministic. Run set.seed(1) before your model fit to produce deterministic results.

Outras dicas

You need to feed the same nfolds and foldid to both models. Check help(cv.glmnet) for more details. This will make the cross-validation is identical and you should get the same model if you run the models on the same data-set.

Just a supplement to the answer of @nograpes. Each time before fitting the model, the same seed should be set. In short, one seed is only available for one model. For example,

set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')

For the code above, the coefficients of model1 and model2 could be different.

set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
set.seed(1)
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')

Only after you set the same seed before fitting the model, the result are exactly the same.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow