Function for sampling between duplicated values in data.frame

Question 1

If you want to chose a random duplicate to keep, rather than duplicateds default behaviour of only keeping the first, then why not randomly shuffle the whole dataset, so that choosing the first in the shuffled set is effectively a random row from the original:

DATAr <- DATA[sample(1:nrow(DATA),]
DATAr <- DATAr[!duplicated(DATAr$Point),]

If the order of your original DATA was inportant, store the sample(...) in a variable, use that to re-order your data, and apply an inverse once you've removed duplicates (or add a column DATA$ind <- 1:nrow(DATA) and sort your data to restore this afterwards.

Question 2

R has built in functions, sample and duplicated. Thus you can simply use

DATA[ sample( !duplicated(DATA$Point), N ), ]
# where `N` is the sample size you'd like.

in data.table syntax, the above would be

DATA[ sample( !duplicated(Point), N )]

Question 3

So you want every row that is not duplicated AND the first instance of those that are duplicated right ?

Then try this:

# build fake dataset
DATA <- as.data.frame(cbind(sample(c(1:10,3:7)),sample(1:15),sample(1:15)))
names(DATA) <- c("Point","some_col","some_other_col")

# check
print(DATA) # See Point has duplicate values


# your function
filter_data <- function(DATA){
distinct_points <- unique(DATA$Point)
as.data.frame(t(sapply(distinct_points, function(x){subset(DATA,Point == x)[1,]})))
}


#result
DATA.new <- filter_data(DATA)
print(DATA.new)