Question

I have a use case in which chat text is to be classified. I want to use DocumentCategorizer in Apache OpenNLP to categorize chat. But for that i must have Training Data that should have Chats already classified. Do i have to manually categorize hundreds of chats to prepare Training and Test Data? What else can i do? I intend the chat categories to be service related PROBLEMS. This list of Categories would then be domain specific. Should the provider of this data, provide me with the categorized chat data? Thanks, in advance.

Was it helpful?

Solution

By definition, you cannot have a classification problem without labelled data. Either someone labels (at least part of) the data, or you should try to address the problem in a different way.

-- Edited to add some examples of how to address the problem without classifying:

In general, depending on the specific task you can try to solve a "classification" problem via clustering or/and document or term matching. Clustering will group together documents related to the same topic, while term matching will observe documents that refer to specific terms. If no training data is available, but you have some knowledge about the problem, either method, or a combination between them might be enough for your information need.

For your specific problem, I would start trying to cluster the chats.

OTHER TIPS

While clustering lets you classify your text and identify topics in them, unsupervised methods often lead to reduced flexibility in controlling the performance of your classification but they remain the best tools if you do not have labeled data.

However, recent advances in zero-shot and few-shot learning can let you build your classifier with little (100 - 200 training data) or no training data at all. Your classifier still retains all the benefits of a supervised classifier and gives you all the control on your categories.

I have built one such system and you can try out the demo on your own categories and data to see the system in action.

Additional resources:

  1. https://www.quora.com/Whats-the-difference-between-one-shot-learning-and-zero-shot-learning
  2. https://arxiv.org/abs/1710.10280
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top