Difference between Hierarchical and Non Hierarchical Clustering?

Question

Don't confuse algorithms and tasks.

There is more than one algorithm for k-means. Actually there are at least two dozen. Some use k-d-trees for acceleration, for example. And they are all heuristics, because finding the optimal k-means solution was shown to be NP-hard in general, I believe.

Similarly, there is the naive O(n^3) runtime and O(n^2) memory approach for hierarchical clustering, and then there are algorithms such as SLINK for single-linkage hierarchical clustering and CLINK for complete-linkage hierarchical clustering that run in O(n^2) time and O(n) memory.

Last but not least, if you use DBSCAN and set minPts=2, the result will effectively be the same as single-link hierarchical clustering when cut at the height of epsilon. Yet, with an appropriate index, DBSCAN runs in O(n log n) (e.g. in ELKI - the scipy implementation is not as clever, it needs O(n^2) memory and runtime).

So why can you have such different runtimes for apparently the same task? First of all, some algorithms (and implementations) can be just very primitive. Easy to implement and understand, but not as clever as they could be. Secondly, some algorithms perform extra work that you may or may not be interested in. Single-linkage hierarchical clustering computes a hierarchy; DBSCAN with minPts=2 only a single cut through this hierarchy - a much simpler result than the complete hierarchy. And last but not least, there are heuristics. Most of the k-means variants fall into this last category: by accepting to miss the global best solution, you can of course become faster than if you performed an exhaustive search.