Would Topic Modelling be classified as NLP or NLU?

https://datascience.stackexchange.com/questions/64714

20-10-2020
|

Question

I recently started my journey into the world of NLP, it's been one heck of a ride. I'm currently trying to understand whether topic modelling would be considered as NLP or NLU.

Initially I would assume that topic modelling would be classified as NLP. However, if we use word embeddings for topic modelling wouldn't it then be classified as NLU, as we have deeper understanding of how the words relate to each other in vector space?

Maybe I'm having trouble formulating the inherent difference between NLP and NLU, when do we draw the line between the two?

Your insight regarding this matter would be highly appreciated.

Solution

Maybe I'm having trouble formulating the inherent difference between NLP and NLU, when do we draw the line between the two?

There is a confusion here: NLP is the whole domain of AI which deals with natural language. It includes virtually any task related to processing language data (usually mostly written data, but that's not the point). Topic modeling is one of these tasks.

NLU is the problem of Natural Language Understanding, which is usually considered as one of the main goals of NLP. If anything, NLU is a problem that NLP tries to solve, i.e. a sub-topic in the large area of NLP.

Also notice that using words embeddings can improve things, but it doesn't solve all the difficulties related to semantics, far from it.

[edit] The scope of NLU is not strictly defined: in the broadest possible definition, it would include anything vaguely related to extracting meaning from text, and in this very generous sense topic modeling would have a connection to it with or without embeddings (and so would a lot of other NLP tasks). Wikipedia says:

The umbrella term "natural-language understanding" can be applied to a diverse set of computer applications, ranging from small, relatively simple tasks such as short commands issued to robots, to highly complex endeavors such as the full comprehension of newspaper articles or poetry passages. Many real world applications fall between the two extremes, for instance text classification for the automatic analysis of emails and their routing to a suitable department in a corporation does not require in depth understanding of the text.

But the most commonly accepted definition of NLU is stricter, it would only consider tasks which directly involve the interpretation of text in a quite complex setting. The typical example is the "virtual assistant" such as Amazon Alexa, OK Google, Apple's Siri. In this sense topic modeling is simply a completely different task, no matter the "degree of understanding".

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange