Simple data-manipulation in R
-
07-11-2019 - |
Question
@Aniko points out that one way to view my problem is that I need to find the connected components of a graph, where the vertices are called groups and, variables group
and nominated_group
indicate an edges between those two groups. My goal is to create a variable parent_Group
which indexes the connected components. Or as I put it before:
I have a dataframe with four variables: ID
, group
, and nominated_ID
, and nominated_Group
.
Consider sister-groups: Groups A and B are sister-groups if there is at least one case in the data where group==A and nominated_group==B, or vice versa.
I would like to create a variable parent_group
which takes on a unique value for each set of sister-groups. In other words, no nominations should occur between cases in different parent_group
s. Making the parent_group
sequential numbers seems like a good idea.
Many thanks for the help I already received here! I can't really contribute here but note that I try to pay it forward at stats.exchange and on wikipedia.
In my fake data, A and B are sister-groups. Either case ID=4 or ID=5 are sufficient to make this true. Each group is also their own sister-group. The goal, the creation of parent_group
, should result in one parent_group
for all cases in A or B, and another parent_group
for group C
df <- data.frame(ID = c(9, 5, 2, 4, 3, 7),
group = c("A", "A", "B", "B", "A", "C"),
nominated_ID = c(9, 8, 4, 9, 2, 7) )
df$nominated_group <- with(df, group[match(nominated_ID, ID)])
df
ID group nominated_ID nominated_group
1 9 A 9 A
2 5 A 8 <NA>
3 2 B 4 B
4 4 B 9 A
5 3 A 2 B
6 7 C 7 C
No correct solution