Frage

I have a task where I need to classify a few Million products. A came along mahout today and started to read some sheets.

As for now I'm a bit confused about the term of a classifier in mahout. I thought with a classifier you could classify a document to any category it would match.

Then, after reading a few sheets I feel more like it is to say if a document is a or !a and not to check if a document is a or b or c or d ...

What I'm looking for is a solution to check multiple possibilities like a or b or c or d ... Am I on a wooden path with mahout or is mahout also build for those kind of tasks? I would like to use a supervised learning algorithm for this part and I don't really know if mahout is the framework to go for, so I'm a bit confused for now.

Any pointers?

War es hilfreich?

Lösung

I think you could probably make mahout work for your problem. I haven't done it myself, so can't give you specifics, but here's two approaches:

1) train a binary classifier on each of the N possibilities: a or !a, b or !b, c or !c, d or !d..., then pick the highest probability from the N results to get the assignment. Typicially classifiers output probabilities instead of True/False

2) check this out for multi-label classification using mahout: https://medium.com/p/4ea08a4662ab

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top