I would recommend distributed computing frameworks to you, but, I think is still of a scale that you can easily handle it on one machine.
Apache Mahout contains the Taste collaborative filtering library, which was designed to scale on one machine. A model of -- what, 10M data points? -- should fit in memory with a healthy heap size. Look at things like GenericItemBasedRecommender
and FileDataModel
.
(Mahout also has distributed implementations based on Hadoop, but I don't think you need this yet.)
I'm the author of that, but have since moved on to commercialize large-scale recommenders as Myrrix. It also contains a stand-alone single machine version, which is free and open source. It also will easily handle this amount of data on one machine. For example, this is a smaller data set than what's used in this example. Myrrix also has a distributed implementation.
There are other fast distributed implementations beyond the above, like GraphLab. Other non-distributed frameworks are also probably fast enough, like MyMediaLite.
I would suggest just using one of these, or if you really are just wondering "how" it happens, check into the source code and look at the data representation.