How to eliminate group of observations under conditions [duplicate]

https://stackoverflow.com/questions/22891316

r
subset

28-06-2023
|

Question

I have a panel data that looks like the following:

id  name    year    dummy
1   Jane    1990    1
1   Jane    1991    1
1   Jane    1992    0
1   Jane    1993    0
2   Tom     1978    0
2   Tom     1979    0
2   Tom     1980    0
3   Jim     1981    1
3   Jim     1982    1
3   Jim     1983    0

I want to subset this data so that I eliminate people without 1 as dummy variable. This means that in the above example, I want to eliminate observations for Tom, because he does not have a dummy variable of 1. Wanted output is:

id  name    year    dummy
1   Jane    1990    1
1   Jane    1991    1
1   Jane    1992    0
1   Jane    1993    0
3   Jim     1981    1
3   Jim     1982    1
3   Jim     1983    0

Would there be a way to code this in R? I'm having trouble because this has to been done by id, since I don't want to eliminate ALL observations with dummy 0.

Solution

You can use ave and subset:

subset(dat, as.logical(ave(dummy, id, FUN = any)))

   id name year dummy
1   1 Jane 1990     1
2   1 Jane 1991     1
3   1 Jane 1992     0
4   1 Jane 1993     0
8   3  Jim 1981     1
9   3  Jim 1982     1
10  3  Jim 1983     0

where dat is the name of your data frame.

OTHER TIPS

Or you could just subset and use %in%.

df <- df[df$name %in% df$name[df$dummy > 0],]

Where df is your data frame

This relies entirely on primitive calls, and so should be (a) pretty fast and (b) not dependent on any packages.

An option using data.table

library(data.table)
setDT(df)[, if(any(dummy)) .SD, by = id]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow