K-Means clustering on multidimensional heterogeneous space

Question 1

k-means is not a good choice, as it will not handle the 180° wrap-around, and distances anywhere but the equator will be distorted. IIRC in northern USA and most parts of Europe, the distortion is over 20% already.

Similar, it does not make sense to use k-means on binary data - the mean does not make sense, to be precise.

Use an algorithm that can work with arbitrary distances, and construct a combined distance function that is designed for solving your problem, on your particular data set.

Then use e.g. PAM or DBSCAN or hierarchical linkage clustering any other algorithm that works with arbitrary distance functions.

Question 2

The mean of a binary feature can be seen as the frequency of that feature. There are cases in which one can standardise a binary feature v by v-\bar{v}.

However, in your case it seems to me that you have three features in three different feature spaces. I'd approach this problem by creating three distances d_v, one appropriate for each feature v \in V. The distance between two entities, say x and y would be given by d(x,y) \sum_{v \in V} w_v d_v(x_{v}, y_{v}). You could play with w_v, but I'd probably constraint it to \sum_{v \in V} w_v =1 and {v}_{v \in V} \geq 0.

The above are just some quick thoughts on it, good luck! PS: Sorry for the text, I'm new here and I don't know how to put latex text here