Question

I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

How can I do that in R?

Was it helpful?

Solution

There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

or

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top