문제

I fit a 3 dimensional contingency table (not provided here but I can if it can help) with a loglinear model, both with loglm and with glm. The two results I get in terms of coefficients are:

> coefficients(nodnox_loglm_model)
$`(Intercept)`
[1] 10.18939

$w
       0.05         0.1        0.15         0.2        0.25         0.3        0.35         0.4        0.45 
-1.04596513 -0.41193617 -0.08840858  0.06407334 -0.06862606  0.02999039  0.17084795  0.45838071  0.35307375 
        0.5 
 0.53856982 

$s
          2           3           4           5 
 0.36697307  0.15164360 -0.48264571 -0.03597096 

and

> coefficients(nodnox_glm_model)
(Intercept)          s3          s4          s5        w0.1       w0.15        w0.2       w0.25        w0.3 
  9.5104005  -0.2153295  -0.8496188  -0.4029440   0.6340290   0.9575566   1.1100385   0.9773391   1.0759555 
      w0.35        w0.4       w0.45        w0.5 
  1.2168131   1.5043458   1.3990389   1.5845350 

I know that these two methods have different numerical procedure - I don't care about that - all I want to know is how can I relate the glm coefficients to the loglm coefficients?

All I found on the internet and the documentation I searched before coming to stackoverflow is this note:

The glm coefficient table works just like the summary for ANOVA produced by lm: the level alphabetically first (s2,w0.5) is used as an intercept, and all subsequent levels are tested against the first (thus the remaining coefficients are differences from the mean, not means themselves).

To me, though, this is not enough to understand how to get the coefficients from the glm output in the form of loglm. Now, your question might be: "why not use loglm directly?" Loglm would not work in my case, (which is not the one I am comparing here, but it has a 5 dimensional table with some zeros. So if I use loglm on the original table, it gives me all the coefficients as NaNs). So I am stuck with glm and I really want to get the coefficients as in loglm.

Thanks a lot!

도움이 되었습니까?

해결책

It seems you have a two-way cross table with 10 levels of factor w and 5 levels of factor s with no interactions in the model. With glm(), the default coding scheme for categorical variables is treatment coding where the first group in a factor is the reference level, and the respective parameter of each remaining group is its difference to this reference. The (Intercept) estimate is for the cell with all groups = reference level for their factor.

With loglm(), the parameters are for deviation coding, meaning that each group gets its own parameter, and the parameters for one factor sum to zero. (Intercept) is the grand mean that gets added to all group effects.

In your example, you could tell glm() to use deviation coding to get the same parameter estimates as with loglm() (see below for an example), or you convert the parameter estimates from treatment coding as follows:

  • w = 0.05 and s = 2 is the reference cell: glm() 9.5104005 = loglm() 10.18939 + -1.04596513 + 0.36697307
  • w = 0.1 and s = 2 is reference level for s but needs the difference from w = 0.1 to the reference w = 0.05: glm() 9.5104005 + 0.6340290 = loglm() 10.18939 + -0.41193617 + 0.36697307
  • w = 0.1 and s = 3 but needs the difference from w = 0.1 to the reference w = 0.05 and the difference from s = 3 to the reference s = 2: glm() 9.5104005 + 0.6340290 + -0.2153295 = loglm() 10.18939 + -0.41193617 + 0.15164360, and so on

Example with glm() using deviation coding (UCBAdmissions is a cross-table with absolute frequencies built into base R):

> library(MASS)                                # for loglm()
> llmFit <- loglm(~ Admit + Gender + Dept, data=UCBAdmissions)
> coef(llmFit)
$`(Intercept)`
[1] 5.177567

$Admit
  Admitted   Rejected 
-0.2283697  0.2283697 

$Gender
      Male     Female 
 0.1914342 -0.1914342 

$Dept
          A           B           C           D           E           F 
 0.23047857 -0.23631478  0.21427076  0.06663476 -0.23802565 -0.03704367 

> UCBdf <- as.data.frame(UCBAdmissions)  # convert to data frame for glm()
> glmFit <- glm(Freq ~ Admit + Gender + Dept, family=poisson(link="log"),
+               contrasts=list(Admit=contr.sum, Gender=contr.sum, Dept=contr.sum),
+               data=UCBdf)
> coef(glmFit)
(Intercept)      Admit1     Gender1       Dept1       Dept2       Dept3       Dept4 
 5.17756677 -0.22836970  0.19143420  0.23047857 -0.23631478  0.21427076  0.06663476 
      Dept5 
-0.23802565 

Note that glm() does not list those parameter estimates that are fully determined (aliased) through the sum-to-zero constraint for the parameters for one factor.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top