Eliminate all but last duplicated elements from a data frame column in R

https://stackoverflow.com/questions/23242182

duplicates
r

08-07-2023
|

Domanda

I have a data frame like this

    head(data)
      V1 V2 V3 V4    V5    V6      V7
1     a 1941 2 14 -73.90 38.60 US009239
2     b 1941 2 14 -74.00 36.90 US009239
3     c 1941 2 14 -74.00 35.40 US009239
4     a 1941 2 15 -74.00 34.00 US009239
5     d 1941 2 15 -74.00 32.60 US009239
6     f 1941 2 15 -73.80 31.70 US009239

and what I would like to do is to eliminate rows corresponding to duplicates of data$V1 (the maximum number of data$V1 duplicates is 2). The problem is that if I do

newdata <- data[!duplicated(data$V1),]

it will keep the first one

 head(newdata)
      V1 V2 V3 V4    V5    V6      V7
1     a 1941 2 14 -73.90 38.60 US009239
2     b 1941 2 14 -74.00 36.90 US009239
3     c 1941 2 14 -74.00 35.40 US009239
5     d 1941 2 15 -74.00 32.60 US009239
6     f 1941 2 15 -73.80 31.70 US009239

while I want to keep the second one

   head(newdata)
          V1 V2 V3 V4    V5    V6      V7
    2     b 1941 2 14 -74.00 36.90 US009239
    3     c 1941 2 14 -74.00 35.40 US009239
    4     a 1941 2 15 -74.00 34.00 US009239
    5     d 1941 2 15 -74.00 32.60 US009239
    6     f 1941 2 15 -73.80 31.70 US009239

any help?

Soluzione

duplicated has a fromLast argument that should suit your needs:

duplicated(c('a', 'b', 'c', 'a', 'd', 'f'), fromLast=TRUE)
## [1]  TRUE FALSE FALSE FALSE FALSE FALSE

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow