Too much data and not enough machines in our Hadoop cluster for Mahout Item-based CF

Question

You seem to have the standard scaling problem of recommendation systems. In your case you should split your analysis into multiple parts.

The item-item similarity calculation part.
The user-item recommendation part using the item-item similarity values.

The point is, that similarity between items having a lot of ratings doesn't change a lot. And exactly this is the costly part. This means you can calculate the similarity for them only once and do it again after a long time (weeks, months?). You can evaluate how much they change after a week, two weeks etc. Then you only need to calculate the item-item similarity for items with fewer ratings every day - if they have new ratings of course! Too few ratings are a problem for itself in the recommendation engine area. I won't go into this right now.

So, when you have your always up-to-date item-item-similarity list, you can do the user-item recommendation based on them. If the amount of your items doesn't change that much then this is a constant time operation. That could be done in real-time when the user access the app. So no need to calculate the recommendation for a user which never comes back. The predicted rating for a user-item is basically the sum of all items rated by that user weighted by the items similarity score. You need to check if mahout is providing