This looks like a bug (or that grabbing multiple columns in a groupby is not implemented?), a workaround is to pass in the groupby column directly:
In [11]: df[['X', 'Y']].groupby(df['group']).apply(RMSE)
Out[11]:
group
A 4.472136
B 4.472136
dtype: float64
To see it's the same:
In [12]: df.groupby('group')[['X', 'Y']].apply(RMSE) # wrong
Out[12]:
group
A 8.944272
B 7.348469
dtype: float64
In [13]: df.iloc[:, 1:].groupby('group')[['X', 'Y']].apply(RMSE) # correct: ignore dummy col
Out[13]:
group
A 4.472136
B 4.472136
dtype: float64
More robust implementation:
To avoid this completely, you could change RMSE to select the columns by name:
In [21]: def RMSE2(X, left_col, right_col):
return(np.sqrt(np.sum((X[left_col] - X[right_col])**2)))
In [22]: df.groupby('group').apply(RMSE2, 'X', 'Y') # equivalent to passing lambda x: RMSE2(x, 'X', 'Y'))
Out[22]:
group
A 4.472136
B 4.472136
dtype: float64
Thanks to @naught101 for pointing out the sweet apply syntax to avoid the lambda.