Question

I am new to mahout environment... I got the following output

/opt/hadoop/mahout-distribution-0.9/bin$ mahout clusterdump \
>    -d /app/hadoop/dmacs/training_set1_sparseout/dictionary.file-0 \
>    -dt sequencefile \
>    -i /app/hadoop/dmacs/training_set1_sparseout/kmeans-clusters/clusters-2-final \
>    -n 20 \
>    -b 100 \
>    -o /app/hadoop/dmacs/kmeans_final_output/cdump.txt \
>    -dm org.apache.mahout.common.distance.CosineDistanceMeasure   

:VL-1480{n=150 c=[1000062,3,2005:0.098, 1000079,1,2002:0.080, 1000079,2,2002:0.078, 1000079,3,2002:0.
    Top Terms:
            25                                      =>  10.670724073251089
            31                                      =>   7.999464999039968
            1664010,5,2005                          =>  1.2396535428365072
            2439493,1,2003                          =>   1.184131249586741
            507603,1,2005                           =>  0.9944797229766845
            199257,3,2005                           =>  0.9928587055206299
            2602249,3,2004                          =>  0.9890585215886434
            184705,3,2004                           =>  0.9728035926818848
            447759,5,2005                           =>  0.9652122163772583
            1152594,3,2004                          =>  0.9619592666625977
            104237,5,2005                           =>  0.9515269517898559
            1473980,3,2005                          =>  0.9478832610448201
            2118461,4,2005                          =>  0.9315701317787171
            1037245,3,2005                          =>  0.9236405754089355
            1639792,1,2002                          =>  0.9183504740397136
            1227322,1,2003                          =>  0.9121313015619914
            2019240,3,2004                          =>   0.909924259185791
            1117152,5,2005                          =>  0.9050878302256267
            2040853,3,2004                          =>  0.9025738382339478
            1309838,5,2005                          =>  0.8964522886276245

What do the top terms actually mean in the output. Thanks in advance !!!

Was it helpful?

Solution

The top terms mean the top few terms of those documents, which are a part of cluster. You can control the top terms output using -n / -- numWords flag with clusterdump command.

For details about the flags you can refer to help:

mahout-distribution-0.9$ bin/mahout clusterdump -h

Also have a look at similar question: Interpreting output from mahout clusterdumper

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top