What are the applications of length normalization?

https://stackoverflow.com/questions/21444074

04-10-2022
|

Question

I found some info about Length Normalization. I found it mentioned only in the context of search engines. Have people used it for different textual purposes? (please forgive my ignorance. I've truly searched for other uses of it but google keeps confusing the term "normalization" with "scaling"...).

Solution

The link you provide in the question already mentions one reason for using length-normalization: to avoid having high term-frequency counts in document vectors. This affects document ranking considerably. A direct application of this is, of course, query-based document retrieval.

There are other algorithm-specific applications as well. For example, if you want to cluster documents using cosine similarity between the vectors: simple clustering algorithms such as k-means may not converge unless the vectors are all on a sphere, i.e. all vectors have the same length.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow