Why are NLP processes considered language-dependent? [closed]

https://stackoverflow.com/questions/18856369

28-06-2022
|

Question

Why are NLP Processes considered language-dependent?

For example, here: http://www.slideshare.net/saschanarr/languageindependent-twitter-sentiment-analysis on slide 6, its says that: "Natural Language Processing methods are often designed specifically for one language".

Why is it so? I would think that once the method is implemented using machine learning, the algorithm is the same and all you need different is the training set...

Solution

In the case of heuristics, those are usually problem- and language-dependent. In the case of machine learning, yes, in an abstract, theoretical sense, the "only" difference is the training set. The availability of training sets for various languages is the first problem. Then comes the number of useful features that can be pruned from the training set, the availability of heuristics and knowledge sources to improve the machine learning, the hyperparameters required to make the learning successful, etc.

As an example, consider the problem of named-entity recognition (NER). On English data, the "word is capitalized" feature is almost a giveaway for spotting the names, but in German, every noun is capitalized. The result is that NER for German is quite a different problem than it is for English.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow