Question

I haven't had much experience with machine learning or clustering, so I'm at a bit of a loss as to how to approach this problem. My data of interest consists of 4 columns, one of which is just an id. The other 3 contain numerical data, values >= 0. The clustering I need is actually quite straightforward, and I could do it by hand, but it will get less clear later on so I want to start out with the right sort of process. I need 6 clusters, which depend on the 3 columns (call them A, B and C) as follows:

A    B    C        Cluster
---- ---- -------- -------
0    0    0        0
0    0    >0       1
0    >0   <=B      2
0    >0   >B       3
>0   any  <=(A+B)  4
>0   any  >(A+B)   5

At this stage, these clusters will give an insight to the data to inform further analysis.

Since I'm quite new to this, I haven't yet learned enough about the various algorithms which do clustering, so I don't really know where to start. Could anyone suggest an appropriate model to use, or a few that I can research.

Was it helpful?

Solution

This does not look like clustering to me.

Instead, I figure you want a simple decision tree classification.

It should already be available in Rapidminer.

OTHER TIPS

You could use the "Generate Attributes" operator.

This creates new attributes from existing ones.

It would be relatively tiresome to create all the rules but they would be something like

cluster : if (((A==0)&&(B==0)&&(C==0)),1,0)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top