How to compute histograms using weka

https://stackoverflow.com/questions/10904205

12-06-2021
|

Question

Given a dataset with 23 points spread out over 6 dimensions, in the first part of this exercise we should do the following, and I am stuck on the second half of this:

Compute the first step of the CLIQUE algorithm (detection of all dense cells). Use three equal intervals per dimension in the domain 0..100,and consider a cell as dense if it contains at least five objects.

Now this is trivial and simply a matter of counting. The next part asks the following though:

Identify a way to compute the above CLIQUE result by only using the functions of Weka provided in the tabs of Preprocess, Classify , Cluster , or Associate . Hint : Just two tabs are needed.

I've been trying this for over an hour now, but I can't seem to get anywhere near a solution here. If anyone has a hint, or maybe a useful tutorial which gives me a little more insight into weka it would be very much appreciated!

Solution

I am assuming you have 23 instances (rows) and 6 attributes (dimensions)

Use three equal intervals per dimension

Use pre-process tab to discretize your data to 3 equal bins. See image or command line. You use 3 bins for intervals. You may choose to change useEqualFrequency to false and true and try again. I think true may give better results.

weka.filters.unsupervised.attribute.Discretize -B 3 -M -1.0 -R first-last

unsupervised.attribute.Discretize

After that cluster your data. This will give show you near instances. Since you would like to find dense cells. I think SOM may be appropriate.

a cell as dense if it contains at least five objects.

You have 23 instances. Therefore try for 2x2=4 cluster centers, then go for 2x3=6,2x4=8 and 3x3=9. If your data points are near. Some of the cluster centers should always hold 5 instances no matter how many cluster centers your choose.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow