Extracting rows from selected column in repeated/grouped data

https://stackoverflow.com/questions/23586616

r
dataframe

19-07-2023
|

سؤال

I have a dataframe (B) given below which contains some repeated entries for various ids (x_1). For each id in x_1, I'm interested in extracting rows which have minimum values (as well as their repeats) in the second column (z_1). For example, for A1, I would like to extract rows 5 and 6. I would also like to keep rows that don't have more than 1 value under the z_1 column-for example, all other entries apart from A1 and B10 should be included in the ouput.

x_1 <- c("A1", "A1", "A1", "A1", "A1", "A1", "B10", "B10", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")
z_1 <- c(rep(1.87, 2), rep(1.56, 2), rep(1.15, 2), rep(1.60, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))
z_2 <- c( c(0.5, 1, 0.5, 1, 0.5, 1), c(0.2, 0.4, 0.2, 0.4), c(0.3, 0.6), c(0.6, 1.2, 1.8), c(0.25, 0.5), c(0.15, 0.3))

Here's the dataframe:

B <- data.frame(x_1, z_1, z_2)

x_1  z_1  z_2
 A1  1.87 0.50
 A1  1.87 1.00
 A1  1.56 0.50
 A1  1.56 1.00
 A1  1.15 0.50
 A1  1.15 1.00
B10  1.60 0.20
B10  1.60 0.40
B10  1.44 0.20
B10  1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40  1.90 0.25
D40  1.90 0.50
G100 1.59 0.15
G100 1.59 0.30

Here's the result I would like to obtain:

y_1  d_1  d_2
A1   1.15 0.50
A1   1.15 1.00
B10  1.44 0.20
B10  1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40  1.90 0.25
D40  1.90 0.50
G100 1.59 0.15
G100 1.59 0.30

Just to add that I've tried a few things which I found from other similar posted questions but none of these is giving the desired output:

aggregate(grouped_B$x_1, by = list(grouped_B$z_1), min)
do.call("rbind", by(B, B$x_1, function(x) x[which.min(unique(x$z_1)), ]))

Also just to add that the data-frame I'm working on is a grouped data:

grouped_B <- groupedData(z_1 ~ z_2 | x_1,      
                         data = B, FUN = mean,
                         labels = list( x = "duration",
                         y = "height"), units = list("(years)"))

I would really appreciate any useful hints/codes.

المحلول

Pretty simple in base R as well:

result <- merge(B,aggregate(z_1~x_1,B,min))
result
#     x_1  z_1  z_2
# 1    A1 1.15 0.50
# 2    A1 1.15 1.00
# 3   B10 1.44 0.20
# 4   B10 1.44 0.40
# 5  B500 1.34 0.30
# 6  B500 1.34 0.60
# 7  C100 1.50 0.60
# 8  C100 1.50 1.20
# 9  C100 1.50 1.80
# 10  D40 1.90 0.25
# 11  D40 1.90 0.50
# 12 G100 1.59 0.15
# 13 G100 1.59 0.30

نصائح أخرى

If each row in your data set is unique, you could do this simply with data.table

library(data.table)
DT <- setDT(B)[, min(z_1), by = c("x_1", "z_2")]
setnames(DT, 1:3, c("y_1", "d_2", "d_1"))
#      y_1  d_2  d_1
#  1:   A1 0.50 1.15
#  2:   A1 1.00 1.15
#  3:  B10 0.20 1.44
#  4:  B10 0.40 1.44
#  5: B500 0.30 1.34
#  6: B500 0.60 1.34
#  7: C100 0.60 1.50
#  8: C100 1.20 1.50
#  9: C100 1.80 1.50
# 10:  D40 0.25 1.90
# 11:  D40 0.50 1.90
# 12: G100 0.15 1.59
# 13: G100 0.30 1.59

If it's not unique, could do

DT <- setDT(B)[, list(d_1 = z_1[grep(min(z_1), z_1)],
                d_2 = z_2[grep(min(z_1), z_1)]), by = c("x_1")]

#      x_1  d_1  d_2
#  1:   A1 1.15 0.50
#  2:   A1 1.15 1.00
#  3:  B10 1.44 0.20
#  4:  B10 1.44 0.40
#  5: B500 1.34 0.30
#  6: B500 1.34 0.60
#  7: C100 1.50 0.60
#  8: C100 1.50 1.20
#  9: C100 1.50 1.80
# 10:  D40 1.90 0.25
# 11:  D40 1.90 0.50
# 12: G100 1.59 0.15
# 13: G100 1.59 0.30

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow