Recommender system using pig or mahout

Question 1

Apache Mahout will provide you with a off-shelf recommendation engine based on collaborative filtering algorithms.

With Pig you will have to implement those algorithms yourself - in Pig Latin, which may be a rather complex task.

Question 2

I know it's not one of your preferred methods, but another product you can use on Hadoop to create a recommendation engine is Oryx.

Oryx was created by Sean Owen (co-author of the book Mahout in Action, and a major contributor to the Mahout code base). It only has 3 algorithms at the moment (Alternating Least Squares, K-Means Clustering, and Random Decision Forests), but the ALS algorithm provides a fairly easy to use Collaborative Filtering engine, sitting on top of the Hadoop infrastructure.

From the brief description of your dataset, it sounds like it would would perfectly. It has a model generation engine (computational layer), and it can generate a new model based on one of 3 criteria:

1) Age (time between model generations)
2) Number of records added
3) Amount of data added

Once a generation of data has been built, there's another java daemon that runs (the service layer), which will serve out the recommendations (user to item, item to item, blind recommendations, etc) via a RESTful API. When a new generation of the model is created, it will automatically pick up that generation and serve it out.

There are some nice features in the model generation as well, such as aging historic data, which can help get around issues like seasonality (probably not a big deal if you're talking about books, though).

The computational layer (model generation) uses HDFS to store/lookup data, and uses MapReduce or YARN for job control. The serving layer is a daemon that can run on each data node, and it accesses the HDFS filesystem for the computed model data to present out over the API.