How to increase the number of permutations in the GLM cross validation function `cv.glm`

https://stackoverflow.com/questions/19024153

29-06-2022
|

Question

I'm wondering if anyone has any experience with specifying a number of permutations in the GLM cross validation function cv.glm (package: boot)? I am using the settings K=2 for splitting my training and validation sets into approximately two equal groups. My $delta results can vary widely in some cases depending on the random seed. From the ?cv.glm help, I cannot see any option for increasing the number of permutations.

Example

require(boot)

DF <- structure(list(Y = c(0.158507483, 0.008510161, 0.002684648, 0.009587276, 
0.001803681, 0.010173461, 0.002273384, 0.00345826, 0.051424454, 
0.029937484, 0.194813452, 0.042138033, 0.022944148, 0.729585218, 
0.887009621, 0.008899131, 0.001588576, 0.0216036, 0.001409499, 
0.161051383, 0.026504919, 0.001495132, 0.059066545, 0.008317594, 
0.490868633, 0.057027831), X1 = c(0.0974369543591941, -0.11971810600977, 
-0.168908964300336, -0.0011723143713434, -0.200018273737778, 
0.0536459384966756, -0.188248143615029, -0.154736748196712, 0.0529959236206016, 
-0.152396350558232, 0.103766445240172, -0.0693365907826557, -0.114615555500542, 
0.488829422819801, 0.561719898192691, -0.0469180067616361, 0.0631502939411764, 
-0.135689617930714, 0.0343957489602316, -0.0749974069726867, 
-0.107592097416425, 0.067741017650224, -0.167713403634508, 0.275062271178857, 
0.276065626134302, -0.0926000525628916), X2 = c(-0.19192408577628, 
0.116576354094024, 0.208731289320505, -0.138772290234524, 0.364065047213473, 
-0.1574052089755, 0.285540178523006, 0.29343767019163, -0.203222931158516, 
0.0835579872715545, -0.157325117354138, -0.0242157560597033, 
-0.175123479037643, 0.174087353210292, 0.246559485637939, -0.43074835446357, 
-0.0181308378901971, 0.0525230701557242, -0.121813588478372, 
-0.0549274842561502, -0.115591654073407, -0.0190993986035446, 
0.124566313208749, 0.138414677580375, -0.0981459346380045, -0.319191657096572
)), .Names = c("Y", "X1", "X2"), class = "data.frame", row.names = c(NA, 
-26L))

fmla1 <- formula(Y ~ X1 + X2)
glm1 <- glm(fmla1, DF, family=gaussian(link="log"))
summary(glm1)

set.seed(111)
cv1 <- cv.glm(DF, glm1, K=2)
cv2 <- cv.glm(DF, glm1, K=2)
cv3 <- cv.glm(DF, glm1, K=2)
cv4 <- cv.glm(DF, glm1, K=2)

cv1$delta; cv2$delta; cv3$delta; cv4$delta # RESULTS
#[1] 0.007317702 0.005484949
#[1] 0.12918099 0.06621125
#[1]  1.029601e+31 -3.602880e+16
#[1] 0.02860412 0.01581949

Solution

Just as an example of running the function multiple times, you could do:

f<-function() cv.glm(DF,glm1,K=2)$delta
set.seed(111)
replicate(4,f())
#             [,1]       [,2]       [,3]     [,4]
# [1,] 0.013848041 0.05176088 0.06215253 12512004
# [2,] 0.008418343 0.02743163 0.03268880  6256002

That gets you a nx2 matrix of results. Just add n to taste. (Note that the model will very often not converge with the data you gave. 26 rows isn't enough to split data in two.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow