You cannot compare these results at all, for at least 3 reasons:
- different metrics
- different datasets -- the reported results are not from experiments on the same dataset
- different tasks -- rating prediction is different from item recommendation -- you would also not say that predicting the price of something and detecting spam are the same tasks, wouldn't you?
Why would you want to compare such different methods anyway? Maybe you want to re-read the first paper -- it uses implicit feedback as an additional signal in order to compute better rating predictions, whereas in the second paper implicit feedback is used to predict implicit feedback.