Question

I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers.

I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert 1000 Points with random coordinates they would be scattered all over the field which would make a cluster analysis worthless.

Is there a simple way to generate Random Numbers which build clusters?

I already thought about either not using random() but random()*random() which generates normally distributed numbers (I think I read this somewhere here on Stack Overflow).

Second approach would be picking a few areas at random and run the point generation again in this area which would of course produce a cluster in this area.

Do you have a better idea?

Was it helpful?

Solution

If you are deliberately producing well formed clusters (rather than completely random clusters), you could combine the two to find a cluster center, and then put lots of points around it in a normal distribution.

As well working in cartesian coords (x,y); you could use a radial method to distribute points for a particular cluster. Choose a random angle (0-2PI radians), then choose a radius. Note that as circumference is proportional radius, the area distribution will be denser close to the centre - but the distribution per specific radius will be the same. Modify the radial distribution to produce a more tightly packed cluster.

OR you could use real world derived data for semi-random point distributions with natural clustering. Recently I've been doing quite a bit of geospatial cluster analysis. For this I have used real world data - zipcode centroids (which form natural clusters around cities); and restaurant locations. Another suggestion: you could use a stellar catalogue or galactic catalogue.

OTHER TIPS

Generate few anchors. True random numbers. Then generate noise around them:

anchor + dist * (random() - 0.5))

this will generate clustered numbers, that will be evenly distributed in distance dist.

  • Add an additional dimension to your model.
  • Draw an irregular (i.e. not flat) surface.
  • Generate numbers in the extended space.
  • Discard all numbers which are on one side of the surface.
  • From every number left, drop the additional dimension.

Maybe I have misunderstood, but the gnu scientific library (written in c) has many distributions written within it - could you not pick coordinates from the Gaussian/poisson etc from that library?

http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html

They provide a simple example with the Poisson distribution from the link, too.

If you need your distribution to be bounded (for example y-coordinate not less than -1) then you can achieve that by rejection sampling from the uniform distribution in the gsl.

Blessings, Tom

My first thought was that you could implement your own using a linear congruential generator and experiment with the coefficients until you get a low enough period to suit your needs. A really low m coefficient should do the trick.

I also like your second idea of running a good RNG around a few pre-selected points to create clusters. You could either target specific areas for the clusters with this method, or generate those randomly as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top