Find extra rows from data table merge in R

https://stackoverflow.com/questions/23518927

r
dataframe

17-07-2023
|

Pregunta

I have a CSV file containing a list of 9541 pairs of point IDs and distances between them, and another file containing the same pairs but a different distance for each. I'm 99% sure they're the same pairs.

I've put them in tables a and b, and merge them like this:

names(a) <- c('Point1', 'Point2', 'Distance')
names(b) <- c('Point1', 'Point2', 'Cheby')
m <- merge(a, b)

All good, except m has 8 extra rows than I was expecting. I've tried merging with all.x=TRUE and all.y=TRUE as well, with the same results, and no fields are NA. How do I find what the 8 extra rows are so I can figure out why they're there?

I've tried merging m back with a and b to see what rows have NA, but there aren't any. Even weirder, there are now 9565 rows. If I merge a small subset of the data frames, it works perfectly, but I wonder if there is a more elegant way of finding out what's going wrong than merging increasingly large subsets until I get an unexpected number of rows back.

Solución

It sounds like some of the points might be duplicated within a data frame? Try

a1 <- a[,-ncol(a)]
a1[duplicated(a1),]
b1 <- b[,-ncol(b)]
b1[duplicated(b1),]

to see if there are any duplicate points.

Edit: Also, to get all the rows in a that have duplicated points, you can do this:

a1 <- a[,-ncol(a)]
duplicated_points_a <- a1[duplicated(a1),]
merge(duplicated_points_a, a)

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow