Question

I am trying to cluster customer behavior based on where they shop given by lat/long pairs. I also have other numeric attributes such as volume, average amount spent, etc. I am considering using HDBSCAN to create clusters. However, I'm not sure whether to feed the dataframe directly to the clustering algorithm or whether I would need to scale/normalize the data.

Is it wise to scale the geolocation pairs? Or would important location information be lost?

Any help would be much appreciated.

https://stats.stackexchange.com/questions/89809/is-it-important-to-scale-data-before-clustering

This page explains a lot. However, in the answer by @Anony-Mousse, he mentions not to scale lat/long pairs. That's good but what about other continuous variables?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top