Question

I have data on transfer payments for thousands of people over several years with monthly entries whether an observation received a payment that month or not. I want to find out whether certain types of transfer receivers proposed by theory can be confirmed by the data. To do so, I plan to first do some descriptive statistics and later use the package TraMineR.

First, however, I want to simply figure out which observation fits which category. One such category, for example, are short time receivers of financial aid who only show up once. Thus, I need to identify all observations who received payments for only three month (or less). In addition, these periods of receiving aid cannot be interrupted, so if someone received aid for two month, the nothing for two, and then one month again, this would already be a different category. Here is a little example for only one year and for 30 observations:

dat <- data.frame(matrix(c(0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0),ncol=12))

In this example, my problem is row 13, otherwise I could simple use rowSums and then pick every row with a result equal or smaller than 3. Which procedure could I use to identify only those observations which received aid only in one connected period? And how would I identify observations such as 13?

Was it helpful?

Solution

You can use this function to identify the number of contiguous periods of payment and the number of months in each period:

aid <- lapply(apply(dat, 1, rle), function(x) unname(x$lengths[x$values==1]))

This will return a list, with one compoent per row of your data. For instance:

> aid[[1]]
integer(0)
> aid[[8]]
[1] 3
> aid[[13]]
[1] 1 1

indicating no period for row 1, one period of 3 months for row 8 and two periods of 1 month for row 13.

To find out how many contiguous periods each row has, you can use this:

cont <- sapply(aid, length)

Result:

> cont
[1] 0 1 1 0 0 0 1 1 0 0 1 1 2 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0
> cont[13]
[1] 2

Note that only row 13 has two separate periods.

OTHER TIPS

You can use rle function to filter which rows values equal to 1 at different times.

idx <- apply(dat,1,function(x){
  y <- rle(x)
  length(y$lengths[y$values ==1])> 1
})

dat[idx,]
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
13  0  0  0  0  0  0  0  0  1   0   0   1

Then you can apply rowSums on the filtred data

rowSums(dat[!idx,]) <=3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top