Question

I have a set of 3d points in Matlab but the problem is that my data found here. And as you can see there are some outliers which are affecting my clustering results. So if anyone could please advise how I can delete these outliers from my data.

Was it helpful?

Solution

Having looked at your data, I don't think any clustering algorithm will do what you want. Instead, you will probably need to train a classifier. This is what the Kinect people did, train a classifier using millions of real and synthetic postures, to have it label limbs, head, etc.

The reason why I don't think density based clustering will work either is because your data is a single, density-connected, body-with-two-boxes-shaped blob. But without knowing what a "body" and a "box" is, segmentation will be rather arbitrary. Or in the case of density based clustering: it will not segment at all, or it will segment e.g. by the rather low resultion of your z axis. Furthermore, your X and Y axes come from a grid based image scan (I assume), so you have a very uniform density on the X and Y axes - but the arms, for example, are not of a lower density than the body or boxes.

You can, however, use DBSCAN with rather broad (and easy to set) parameters to remove the noise.

E.g. in ELKI the following parameters yield reasonable results:

java -jar elki.jar -dbc.in /tmp/XX.csv -algorithm clustering.DBSCAN \
-dbscan.epsilon 0.05 -dbscan.minpts 100

The majority cluster is your data with the outliers removed; even with this blob near the foot removed.

To speed up the clustering process, you can add the parameters

-db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory \
-pagefile.pagesize 1000 -spatial.bulkstrategy SortTileRecursiveBulkSplit

which yields a runtime opf 4.5 seconds here. This obviously is not good enough for realtime operation as on a Kinect; but it is not surprising to see a directed classification algorithm to outperform an unsupervised method - this is in fact to be expected.

Here is the result of clustering the data set with the parameters above:

DBSCAN result

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top