How to subtract first entry from last entry in grouped data

https://stackoverflow.com/questions/23633890

r
dataframe

21-07-2023
|

Pergunta

I would appreciate some help with the following task: From the data frame below (C), for each id I would like to subtract the first entry under column d_2 from the final entry and then store the results in another dataframe containing the same ids. I can then merge this with my initial dataframe. Pls note that the subtraction has to be in this order (last entry minus first entry for each id).

Here are the codes:

id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")

d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))

set.seed(2)

d_2 <- round(runif(13, -1, 1), 2)

C <- data.frame(id, d_1, d_2)

id   d_1   d_2
A1   1.15 -0.63
A1   1.15  0.40
B10  1.44  0.15
B10  1.44 -0.66
B500 1.34  0.89
B500 1.34  0.89
C100 1.50 -0.74
C100 1.50  0.67
C100 1.50 -0.06
D40  1.90  0.10
D40  1.90  0.11
G100 1.59 -0.52
G100 1.59  0.52

Desired result:

id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")

difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)

diff_df <- data.frame(id2, difference)

id2    difference
A1        1.03
B10      -0.81
B500      0.00
C100      0.68
D40       0.01
G100      1.04

I attempted this by using ddply to obtain the first and last entries but I'm really struggling with indexing the "function argument" in the second code (below) to get the desired outcome.

C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])

ddply(C_1, .(patient), function )

To be honest, I'm not very familiar with the ddply package-I got the code above from another post on stack exchange .

My original data is a groupedData and I believe another way of approaching this is using gapply but again I'm struggling with the third argument here (usually a function)

grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))

x1 <- gapply(grouped_C, "d_2", first_entry)

x2 <- gapply(grouped_C, "d_2", last_entry)

where first_entry and last_entry are functions to help me get the first and and last entries. I can then get the difference with: x2 - x1. However, I'm not sure what to input as first_entry and last_entry in the above codes (perhaps to do with head or tail ?).

Any help would be much appreciated.

Solução

This can be done easily with dplyr. The last and first functions are very helpful for this task.

library(dplyr)               #install the package dplyr and load it into library 

diff_df <- C %>%             #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
  group_by(id) %>%            #group the whole data.frame C by id
  summarize(difference = last(d_2)-first(d_2))     #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group

#    id difference             #this is the result stored in diff_df
#1   A1       1.03
#2  B10      -0.81
#3 B500       0.00
#4 C100       0.68
#5  D40       0.01
#6 G100       1.04

Edit note: updated post with %>% instead of %.% which is deprecated.

Outras dicas

If you have any singletons and they need to be left alone, then this will solve your problem. It's the same as docendo discimus's answer, but with an if-else component to deal with the singleton cases:

library(dplyr)               
diff_df <- C %>%             
   group_by(id) %>%
   summarize(difference = if(n() > 1) last(d_2) - first(d_2) else d_2)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow