How to interpret R linear regression when there are multiple factor levels as the baseline? [closed]

StackOverflow https://stackoverflow.com/questions/21677105

  •  09-10-2022
  •  | 
  •  

My data has 3 independent variables, all of which are categorical:

condition: cond1, cond2, cond3

population: A,B,C

task: 1,2,3,4,5

The dependent variable is the task completion time. I run lm(time~condition+user+task,data) in R and get the following results:

enter image description here

What confuses me is that cond1, groupA, and task1 are left out from the results. From the thread linear regression "NA" estimate just for last coefficient, I understand that one factor level is chosen as the "baseline" and shown in the (Intercept) row.

But what if there are multiple factor levels used as the baseline, as in the above case?

  • Does the (Intercept) row now indicates cond1+groupA+task1?
  • What if I want to know the coefficient and significance for cond1, groupA, and task1 individually?
  • For example, groupB has an estimated coefficient +9.3349, compared to groupA? Or compared to cond1+groupA+task1?
有帮助吗?

解决方案

One person of your population must have one value for each variable 'condition', 'population' and 'task', so the baseline individual must have a value for each of this variables; in this case, cond1, A and t1. All of the results are based over the ideal (mean) individual with these independent variables, so the intercept do give the mean value of time for cond1, groupA and task1.

The significance or coefficient for cond1, groupA or task1 makes no sense, as significance means significant different mean value between one group and the reference group. You can not compare the reference group against itself.

As your model has no interactions, the coefficient for groupB means that the mean time for somebody in population B will be 9.33(seconds?) higher than the time for somebody in population A, regardless of the condition and task they are performing, and as the p-value is very small, you can stand that the mean time is in fact different between people in population B and people in the reference population (A). If you added an interaction term to the model, these terms (for example usergroupB:taskt4) would indicate the extra value added (or substracted) to the mean time if an individual has both conditions (in this example, if an individual is from population B and has performed task 4). These effects would be added to the marginal ones (usergroupB and taskt4).

Hope I helped.

其他提示

Does the (Intercept) row now indicates cond1+groupA+task1?

Yes.

What if I want to know the coefficient and significance for cond1, groupA, and task1 individually?

Think about what significance means. You need to formulate a hypothesis. In your example everything is compared to the intercept and your question doesn't really make sense. However, you can always conduct pairwise comparisons between all possible effect combinations (see package multcomp).

For example, groupB has an estimated coefficient +9.3349, compared to groupA? Or compared to cond1+groupA+task1?

It's the difference between cond1/task1/groupA and cond1/task1/groupB. (As @Rufo correctly points out, it is of course an overall effect and actually the difference between groupB and groupA provided the other effects are equal.)

By default, R uses treatment contrasts for categorial variables. Hence, the first level is treated as the base level. All remaining levels are compared with the base level.

Your base levels are cond1 for condition, A for population, and 1 for task. All coefficients are estimated in relation to these base levels.

The intercept is just the mean of the response variable in the three base levels.

For example, the effect conditioncond2 is the difference between cond2 and cond1 where population is A and task is 1. Hence, the coefficients do not tell you anything about an overall difference between conditions, but in the data related to the base levels only. (Analogously, conditioncond3 is the difference between cond3 and cond1.)

The same is true for the other factors. The effects of population hold for condition cond1 and task 1 only. The effects of task hold for condition cond1 and population A only.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top