Question

I found some info about Length Normalization. I found it mentioned only in the context of search engines. Have people used it for different textual purposes? (please forgive my ignorance. I've truly searched for other uses of it but google keeps confusing the term "normalization" with "scaling"...).

Was it helpful?

Solution

The link you provide in the question already mentions one reason for using length-normalization: to avoid having high term-frequency counts in document vectors. This affects document ranking considerably. A direct application of this is, of course, query-based document retrieval.

There are other algorithm-specific applications as well. For example, if you want to cluster documents using cosine similarity between the vectors: simple clustering algorithms such as k-means may not converge unless the vectors are all on a sphere, i.e. all vectors have the same length.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top