Question

I am new to mahout environment... I got the following output

/opt/hadoop/mahout-distribution-0.9/bin$ mahout clusterdump \
>    -d /app/hadoop/dmacs/training_set1_sparseout/dictionary.file-0 \
>    -dt sequencefile \
>    -i /app/hadoop/dmacs/training_set1_sparseout/kmeans-clusters/clusters-2-final \
>    -n 20 \
>    -b 100 \
>    -o /app/hadoop/dmacs/kmeans_final_output/cdump.txt \
>    -dm org.apache.mahout.common.distance.CosineDistanceMeasure   

:VL-1480{n=150 c=[1000062,3,2005:0.098, 1000079,1,2002:0.080, 1000079,2,2002:0.078, 1000079,3,2002:0.
    Top Terms:
            25                                      =>  10.670724073251089
            31                                      =>   7.999464999039968
            1664010,5,2005                          =>  1.2396535428365072
            2439493,1,2003                          =>   1.184131249586741
            507603,1,2005                           =>  0.9944797229766845
            199257,3,2005                           =>  0.9928587055206299
            2602249,3,2004                          =>  0.9890585215886434
            184705,3,2004                           =>  0.9728035926818848
            447759,5,2005                           =>  0.9652122163772583
            1152594,3,2004                          =>  0.9619592666625977
            104237,5,2005                           =>  0.9515269517898559
            1473980,3,2005                          =>  0.9478832610448201
            2118461,4,2005                          =>  0.9315701317787171
            1037245,3,2005                          =>  0.9236405754089355
            1639792,1,2002                          =>  0.9183504740397136
            1227322,1,2003                          =>  0.9121313015619914
            2019240,3,2004                          =>   0.909924259185791
            1117152,5,2005                          =>  0.9050878302256267
            2040853,3,2004                          =>  0.9025738382339478
            1309838,5,2005                          =>  0.8964522886276245

What do the top terms actually mean in the output. Thanks in advance !!!

Était-ce utile?

La solution

The top terms mean the top few terms of those documents, which are a part of cluster. You can control the top terms output using -n / -- numWords flag with clusterdump command.

For details about the flags you can refer to help:

mahout-distribution-0.9$ bin/mahout clusterdump -h

Also have a look at similar question: Interpreting output from mahout clusterdumper

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top