Вопрос

I'm new to mahout and still trying to figure things out.

I'm trying to run a KNN based recommender using mahout 0.8 that runs in hadoop cluster (distributed recommender). I'm using mahout 0.8, so KNN is deprecated, but it is still usable (at least when I make it in java code)

I have several questions:

  1. Is it true that there are basically two mahout implementations? distributed (runs from command line) non disributed (runs from jar file)

  2. Assumming (1) is correct, Is mahout support running KNN based recommender from command-line? Can someone gives me a direction to do it?

  3. Assumming (1) is wrong, how can I build a recommender in java (I'm using eclipse) that runs in hadoop cluster (distributed)?

Thanks!

Это было полезно?

Решение

KNN is being deprecated because it is being replaced with item-based and user-based cooccurrence recommenders and the ALS-WR recommender, which are better, more modern.

  1. Yes, but not all code has a CLI interface. For the most part the CLI jobs in Mahout are Hadoop/distributed jobs that produce files in HDFS for output. These can be run from jar files with your own code wrapping them as you must with the local/non-distributed/non-Hadoop versions, which do not have a CLI. The in-memory recommenders require you to pass in a user ID to get recs, so you have to write code to do that. The Hadoop versions do have a CLI since they precalculate all recs for all users and put them in files. You'll probably insert them in your DB or serve them up some other way.
  2. No, to my knowledge only user-based, item-based, and ALS-WR recommenders are supported from the command line. This runs the Hadoop/distributed version of the recommenders. This can work on a single machine, of course even using the local filesystem since Hadoop can be set up that way.
  3. For the in-memory recommenders, just write your driver code and run them in eclipse, since Hadoop is not involved it works fine. If you want to use the Hadoop versions, setup Hadoop on your dev machine to run locally using the local filesystem and everything works fine in eclipse. Once you have things debugged move it to your Hadoop cluster. You can also debug remotely on the cluster but that is another question altogether.

The latest thing in Mahout recommenders is one that is trained in the background using Hadoop then the output is indexed by Solr. You then query Solr with items the user has expressed a preference for, no need to precalculate all recs for all users since they returned from a Solr query in near realtime. This is in Mahout 1.0-SNAPSHOT's mahout/examples/ or here https://github.com/pferrel/solr-recommender

BTW this code is being integrated with Mahout 1.0 and moved to run on Spark instead of Hadoop so even the training step will be much much faster.

Update: I've clarified what can be run from the CLI above.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top