Question

I'm trying to use R to conduct Poisson regression on some data that I have. The current structure of the data is as follows:

Data is stratified based on three occupations. There are four levels of income in the data. Within each stratum, for each level of income there is

  1. the number of workplace accidents that have occurred, and
  2. the total man months observed.

Here's an example of the setup. The number in parentheses is the total man months observed and the number not in parentheses is the number of workplace accidents.

My question is how do I set up this data and perform a Poisson regression on the effect of income level on the occurrence of workplace accidents? Ideally I would like to adjust for occupation and find out the effect of only income, but as a starting point, I'm not sure how to set it up as a Poisson regression problem at all. I thought about doing something like dividing the number of injuries by the months of observation, but then that gives non-integer values so I assume that's not the right thing to do.

To reiterate, predictor: income level; response variable: workplace accidents.

BTW, it would be very easy to separate the parentheses numbers and put them into their own column, if that would make sense to do.

I'd really appreciate any suggestions on how to set this up. I am sure other statisticians are working with similarly structured data and might like to gain some insight as well. Thanks so much!

Was it helpful?

Solution

@thelatemail might be correct in think this to be better suited for stats.stackexchange.com but here is some R code. That data is in wide format and you need to re-structure it to long format. (And you will not want to include the totals columns. After converting the first four columns to a long format where you had 'occupation' and 'level' as factor-class variables, and accident 'counts' and exposure 'months' as numeric columns, you could use this call to glm.

fit <- glm( counts ~ level + occup + offset(log(months)), data=dfrm, family="poisson")

The offset needs to be log()-ed to agree with the logged counts created by the default link function for the poisson-family.

(You cannot really expect us to redo that data entry task, now can you?)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top