how to use startPhase in Mahout

Question 1

Here is what I found:

phase 0 is about PreparePreferenceMatrixJob and it has 3 hadoop jobs:

PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer
PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer
PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer

phase 1 is about RowSimilarityJob and it has 3 jobs:

RowSimilarityJob-VectorNormMapper-Reducer
RowSimilarityJob-CooccurrencesMapper-Reducer
RowSimilarityJob-UnsymmetrifyMapper-Reducer

phase 2 is about RecommenderJob and it has 3 jobs:

RecommenderJob-SimilarityMatrixRowWrapperMapper-Reducer
RecommenderJob-UserVectorSplitterMapper-Reducer
RecommenderJob-Mapper-Reducer

phase 3 is the last one and it has only one job:

RecommenderJob-PartialMultiplyMapper-Reducer

Also output from phase 1 here in RecommenderJob class is exactly the same as the output from phase 0 and 1 of ItemSimilarityJob (but the temp directory names are different).

Question 2

Yes, that's correct. It's a fairly crude mechanism. Really it controls which of a series of MapReduce jobs are run. You have to read the code to know what they are, yes. They vary by job.

If I'd done it over again I would have just made it detect the presence of output to know to skip the jobs. (That's what I've done in my next-gen recommender project.)