Aggregate a data.frame without a function

Question

The problem as you state seems to be removing duplicate rows from a data.frame and this does not require any aggregation. Based on your example this is what you're after:

unique(test.df[c(1,3,4)])
# id x1 x2
#1  A  1  A
#4  B  2  B

EDIT:

I don't quite get as to what do you mean by:

"I tried with FUN=unique but does not seem to work."

Just for the sake of explaining as to what you might have gotten with aggregate wrong, here, I show how one could get the same with aggregate:

test.df$x2 <- as.character(test.df$x2)
aggregate(. ~ id, FUN=unique , data = test.df[c(1,3,4)] )

#  id x1 x2
#1  A  1  A
#2  B  2  B

However, there is no need to use aggregate() here. It's terribly inefficient for this problem. You can check it out with system.time(.) which already gives a difference even on this data:

system.time(unique(test.df[c(1,3,4)]))
#    user  system elapsed 
#   0.001   0.000   0.001 
system.time(aggregate(. ~ id, FUN=unique , data = test.df[c(1,3,4)] ))
#    user  system elapsed 
#   0.004   0.000   0.004

Go ahead and run this on your million rows and check your results with identical and have a look at the run time.

From your comments I think you're confused with the behaviour of unique. As @mnel explains, it (unique.data.frame) removes all duplicate rows alone from the given data.frame. It works for your case because you say that x1 and x2 will have the same values for each ID. So, you dont have to know where in the data.frame ID is. You just have to pick 1 row for each ID.