Frage

Why are NLP Processes considered language-dependent?

For example, here: http://www.slideshare.net/saschanarr/languageindependent-twitter-sentiment-analysis on slide 6, its says that: "Natural Language Processing methods are often designed specifically for one language".

Why is it so? I would think that once the method is implemented using machine learning, the algorithm is the same and all you need different is the training set...

War es hilfreich?

Lösung

In the case of heuristics, those are usually problem- and language-dependent. In the case of machine learning, yes, in an abstract, theoretical sense, the "only" difference is the training set. The availability of training sets for various languages is the first problem. Then comes the number of useful features that can be pruned from the training set, the availability of heuristics and knowledge sources to improve the machine learning, the hyperparameters required to make the learning successful, etc.

As an example, consider the problem of named-entity recognition (NER). On English data, the "word is capitalized" feature is almost a giveaway for spotting the names, but in German, every noun is capitalized. The result is that NER for German is quite a different problem than it is for English.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top