Merge two R data frames and identify the source of each row

Question 1

I would just add another column before merging to make life easier:

example_df1$source <- "X"
example_df2$source <- "Y"
Merged <- merge(x = example_df1, y = example_df2,
                all = TRUE, by = c("subject_id", "gender", "weight"))
Merged$rowSource <- apply(Merged[c("source.x", "source.y")], 1, 
                          function(x) paste(na.omit(x), collapse = ""))
Merged
#   subject_id gender weight score source.x site1 site2 site3 source.y rowSource
# 1        101      M    120    10        X    13     3    31        Y        XY
# 2        102      F    130    12        X    NA    NA    NA     <NA>         X
# 3        102      M    130    NA     <NA>    18     7    28        Y         Y
# 4        103      M    110    11        X    23     8    12        Y        XY
# 5        104      M    114    13        X    NA    NA    NA     <NA>         X
# 6        104      M    117    NA     <NA>    12    11    29        Y         Y
# 7        105      F    144    11        X     4     0    40        Y        XY

From there, it should be easy to change "XY" to "both" if that is what you prefer in your output, and you can then drop the "source.x" and "source.y" columns....

Question 2

this does it all in one merging step and does not modify the original data.frames

mm<-transform(merge(
    x=cbind(example_df1,source="x"),
    y=cbind(example_df2,source="y"),
    all=TRUE, by=intersect(names(example_df1), names(example_df2))),
    source=ifelse(!is.na(source.x) & !is.na(source.y), "both", 
        ifelse(!is.na(source.x), "x", "y")),
    source.x=NULL,
    source.y=NULL
)

Question 3

Thanks again for the answers. Once I saw the solution of just using cbind() to attach the source variable to the data frame, it was easy. I wrote a simple function that does it, which I'm sharing here.

merge_with_source <- function(x,y,name.x="X",name.y="Y") {

    # Find the variables that the two data frames have in common
    merge.names <- intersect(names(x),names(y))

    # Next, attach a column to each data frame with the chosen name
    x.df <- cbind(x,datsrc=name.x)
    y.df <- cbind(y,datsrc=name.y)

    # Create a merged data frame on the common names
    merged.df <- merge(x=x.df,
                       y=y.df,
                       all=TRUE,
                       by=merge.names)

    # Eliminate NAs from the data source column
    merged.df[is.na(merged.df$datsrc.x),"datsrc.x"] <- ""
    merged.df[is.na(merged.df$datsrc.y),"datsrc.y"] <- ""

    # Paste the data source columns together to get a single variable
    # Then, note those that are "Both" by replacing the mangled name
    merged.df$datsrc <- paste(merged.df$datsrc.x,merged.df$datsrc.y,sep="")
    merged.df[merged.df$datsrc==paste(name.x,name.y,sep=""),"datsrc"] <- "Both"

    # Remove the data frame-specific variables
    merged.df$datsrc.x <- rm()
    merged.df$datsrc.y <- rm()

    return(merged.df)
}