In R, how can I grab the penultimate row from a data frame for each ID when IDs are non-unique? [duplicate]

StackOverflow https://stackoverflow.com/questions/23597838

  •  20-07-2023
  •  | 
  •  

Question

I have data with the following format. There is a non-unique ID, the number of times it's shown up, and more data.

I want to add the pen-ultimate row for each ID to a new table, IE a2, and b4.

What are a couple methods for accomplishing this?

ID  #   data
a   1   ...
a   2   ...
a   3   ...

b   1   ...
b   2   ...
b   3   ...
b   4   ...
b   5   ...
...
Was it helpful?

Solution

In addition to @Ben's answer and those in the duplicate answer, you could use dplyr to achieve this:

df %.%                 #your data.frame
 group_by(ID) %.% 
 mutate(count = 1:n()) %.% 
 filter(count %in% max(c(count-1,1))) %.%   #if each ID occures more than 1 time, you can simplify this to filter(count %in% max(count-1)) %.%
 select(-count)

This can also be written in a single line:

df %.% group_by(ID) %.% mutate(count = 1:n()) %.% filter(count %in% max(c(count-1,1))) %.% select(-count)

OTHER TIPS

I would use plyr::ddply:

penult <- function(x) head(tail(x,2),1))
ddply(mydata,"ID",penult)

Somewhat to my surprise this actually works fine in the edge case (only one row per ID), because tail(x,2) returns a single row in that case.

 mydata[ tapply( rownames(mydata), mydata$ID, function(n) n[ min(1, length(n)-1 ] ) ), ]

No testing in absence of a valid example. The edge case of a single row for an ID was not considered in your problem formulation so I decided to use the solitary row in that situation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top