I have a dataset as -
id date sector region price score
1 SWECEC2 2011-12-31 Finance R1 2.297936 -1.0000000
2 FRACPY1 2011-12-31 FMCG R1 10.500936 -0.8571429
3 FRAACO1 2011-12-31 Finance R1 19.228936 -0.7142857
4 GERAIM1 2011-12-31 Finance R1 4.936936 -0.5714286
5 GERAKX1 2011-12-31 Finance R1 8.140936 -0.5714286
6 NETAHI1 2011-12-31 FMCG R1 -7.229064 -0.5714286
7 UKIJLK1 2011-12-31 Finance R2 17.766936 -0.5714286
8 FRAAHJ1 2011-12-31 Finance R2 5.840936 -0.4285714
9 FRACLB1 2011-12-31 FMCG R2 -4.490064 -0.4285714
10 GERAIO1 2011-12-31 FMCG R2 2.338936 -0.4285714
I need to create a new column to this dataset that will categorize each row of the dataset for a group of date, sector and region as 1 if the score is less than the average score for that group of date, sector and region else 2. i.e. average of score for date=2011-12-32, sector=Finance, Region = R1 is -0.7142857 and average of score for date=2011-12-32, sector=FMCG, Region = R2 is -0.4285714
and so on then the new column for the first group should be like -
id date sector region price score Rating
SWECEC2 12/31/2011 Finance R1 2.297936 -1 1
FRAACO1 12/31/2011 Finance R1 19.228936 -0.7142857 2
GERAIM1 12/31/2011 Finance R1 4.936936 -0.5714286 2
GERAKX1 12/31/2011 Finance R1 8.140936 -0.5714286 2
I thought we could do this in ddply but I can't think of how to because I am not summarizing the data here, just creating a new column based on each group in a data.
Any suggestions?
Thanks!
The sample of my data set is here -
ds <- structure(list(id = structure(c(9L, 4L, 1L, 5L, 7L, 8L, 10L,
2L, 3L, 6L), .Label = c("FRAACO1", "FRAAHJ1", "FRACLB1", "FRACPY1",
"GERAIM1", "GERAIO1", "GERAKX1", "NETAHI1", "SWECEC2", "UKIJLK1"
), class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "12/31/2011", class = "factor"),
sector = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
2L), .Label = c("Finance", "FMCG"), class = "factor"), region = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("R1", "R2"
), class = "factor"), price = c(2.297936, 10.500936, 19.228936,
4.936936, 8.140936, -7.229064, 17.766936, 5.840936, -4.490064,
2.338936), score = c(-1, -0.8571429, -0.7142857, -0.5714286,
-0.5714286, -0.5714286, -0.5714286, -0.4285714, -0.4285714,
-0.4285714)), .Names = c("id", "date", "sector", "region",
"price", "score"), class = "data.frame", row.names = c(NA, -10L
))