Using LibShortText with files in LibSVM format

https://stackoverflow.com/questions/22135259

19-10-2022
|

Question

I'm trying to use LibShortText but I don't entirely understand how it works.

From the README, it looks like it's functions are for text-files. However, I need to classify files that are already in LibSVM format, so I suppose functions like text-train.py and text-predict.py won't do...?

The README also states that:

If a preprocessor directory is given instead, then it is assumed that the 
training data is already in LIBSVM format.

Anybody know what a preprocessor directory is...?

From the LibShortText documentations, I see that there is a "Middle-level Classification Modeul - learner" that works on files like mine. However, I don't understand how it works! It doesn't have all the paramters that LIBSVM has, for example... And I haven't succeeded in finding how to save, or understand the results (where are the weights and predictions?).

If someone could explain how this thing words I'd really appreciate it (I've been testing it for quite a few hours now)... Thanks!

Solution

Per the docs (http://www.csie.ntu.edu.tw/~cjlin/libshorttext/doc/libshorttext.html#quick-start) you can append .svm files using the -A option.

But if you you've already got your data in libsvm format you can use liblinear (the library underlying libshorttext) directly.

If you've got a distribution of libshorttext you already have liblinear. You can compile and run by cd'ing (from wherever you've got libshorttext unpacked) like so:

$ cd libshorttext/classifier/learner/liblinear $ make $ ./train train_file.svm $ ./predict test_file.svm model_file output_file

For reference here is the liblinear README: https://github.com/ninjin/liblinear/blob/master/README which I've found very handy.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow