How to generate Bad Random Numbers
-
29-09-2019 - |
Question
I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers.
I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert 1000 Points with random coordinates they would be scattered all over the field which would make a cluster analysis worthless.
Is there a simple way to generate Random Numbers which build clusters?
I already thought about either not using random()
but random()*random()
which generates normally distributed numbers (I think I read this somewhere here on Stack Overflow).
Second approach would be picking a few areas at random and run the point generation again in this area which would of course produce a cluster in this area.
Do you have a better idea?
Solution
If you are deliberately producing well formed clusters (rather than completely random clusters), you could combine the two to find a cluster center, and then put lots of points around it in a normal distribution.
As well working in cartesian coords (x,y); you could use a radial method to distribute points for a particular cluster. Choose a random angle (0-2PI radians), then choose a radius. Note that as circumference is proportional radius, the area distribution will be denser close to the centre - but the distribution per specific radius will be the same. Modify the radial distribution to produce a more tightly packed cluster.
OR you could use real world derived data for semi-random point distributions with natural clustering. Recently I've been doing quite a bit of geospatial cluster analysis. For this I have used real world data - zipcode centroids (which form natural clusters around cities); and restaurant locations. Another suggestion: you could use a stellar catalogue or galactic catalogue.
OTHER TIPS
Generate few anchors. True random numbers. Then generate noise around them:
anchor + dist * (random() - 0.5))
this will generate clustered numbers, that will be evenly distributed in distance dist
.
- Add an additional dimension to your model.
- Draw an irregular (i.e. not flat) surface.
- Generate numbers in the extended space.
- Discard all numbers which are on one side of the surface.
- From every number left, drop the additional dimension.
Maybe I have misunderstood, but the gnu scientific library (written in c) has many distributions written within it - could you not pick coordinates from the Gaussian/poisson etc from that library?
http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html
They provide a simple example with the Poisson distribution from the link, too.
If you need your distribution to be bounded (for example y-coordinate not less than -1) then you can achieve that by rejection sampling from the uniform distribution in the gsl.
Blessings, Tom
My first thought was that you could implement your own using a linear congruential generator and experiment with the coefficients until you get a low enough period to suit your needs. A really low m
coefficient should do the trick.
I also like your second idea of running a good RNG around a few pre-selected points to create clusters. You could either target specific areas for the clusters with this method, or generate those randomly as well.