Pergunta

OK I'm trying to use Mallet to classify some documents in Windows

I've achieved it in Linux. Just can't get it do the job in Windows (target environment)

I've imported the data into a .mallet file.

And then created a classifier using this input data.

-rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier

and

07/03/2014  21:28        15,197,116 nntp.classifier

However when I run in Linux:

bin/mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier

it iterates any files in the testfolder and dumps out what class it thinks each it.

But if I run same command in Windows:

bin\mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier

It just dumps out the command list:

Mallet 2.0 commands:
  import-dir        load the contents of a directory into mallet instances (one per file)
  import-file       load a single file into mallet instances (one per line)
  import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
  train-classifier  train a classifier from Mallet data files
  train-topics      train a topic model from Mallet data files
  infer-topics      use a trained topic model to infer topics for new documents
  estimate-topics   estimate the probability of new documents given a trained model
  hlda              train a topic model using Hierarchical LDA
  prune             remove features based on frequency or information gain
  split             divide data into testing, training, and validation portions
Include --help with any option for more information

Something that I did notice: I

f I run bin/mallet classify-dir --help in linux I get the help file i.e. descriptions of each command, but the same thing in Windows bin\mallet classify-dir --help does not produce the same result - just that command list above... (it does the same thing if you enter junk as the command)

Whereas one of the earlier command e.g. bin/mallet import-dir --help and bin\mallet import-dir --help produces the same full help file output.

Foi útil?

Solução 2

there's a problem whit mallet.bat file in bin directory. You should modify it in :

@echo off

rem This batch file serves as a wrapper for several
rem  MALLET command line tools.

if not "%MALLET_HOME%" == "" goto gotMalletHome

echo MALLET requires an environment variable MALLET_HOME.
goto :eof

:gotMalletHome

set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8

set CMD=%1
shift

set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift

if not "%CLASS%" == "" goto gotClass

echo Mallet 2.0 commands:
echo   import-dir        load the contents of a directory into mallet instances (one per file)
echo   import-file       load a single file into mallet instances (one per line)
echo   import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
echo   train-classifier  train a classifier from Mallet data files
echo   classify-dir      classify the contents of a directory with a saved classifier
echo   classify-file     classify a file with a saved classifier
echo   train-topics      train a topic model from Mallet data files
echo   infer-topics      use a trained topic model to infer topics for new documents
echo   estimate-topics   estimate the probability of new documents given a trained model
echo   hlda              train a topic model using Hierarchical LDA
echo   prune             remove features based on frequency or information gain
echo   split             divide data into testing, training, and validation portions
echo Include --help with any option for more information


goto :eof

:gotClass

set MALLET_ARGS=

:getArg

if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg

:run

java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%

:eof

for being able to classify in Windows environments.

I hope this can help.

Ignazio

Outras dicas

Please note that there is a typo on line 23 of the .bat file provided by ignazio (and included in the mallet-2.0.7 download, unfortunately) that causes it to look for "import-smvlight" instead of "import-svmlight", which is what's specified in the help information. If you want to use this function, make sure you switch the 'm' and 'v'.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top