It's hard to judge based on a small and synthetic data set. I think the biggest factor will be parameters here -- what are the # of features? lambda? I would expect features = 2 here. If it's higher I think you quickly over-fit this and the results are mostly the noise left over from that after it perfectly explains that user 11 doesn't interact with 222 and 333.
The values are quite low, suggesting both of these are not likely results, and so their order may be more noise than anything. Do you see different results if the model is rebuilt from another random starting point?