Frage

I'm doing some clustering research and need to generate synthetic data that would look something like these examples:

Dataset examples

We have 2d plots with 2 classes (red and black). How could I generate 2D data like this? It has a V structure, so I was thinking about generating points around straight lines - is there a way to do that in R? I'm using R, but am open to other tools (just data has to be exportable).

War es hilfreich?

Lösung

Here's a thought.

n <- c(200,200)                 # Number of points in each class
cls <- rep(1:2, n)              # Class memberships
i <- c(.2-.12*abs(rnorm(n[1])), # Noiseless x position
       -.2+.12*abs(rnorm(n[2])))
noise <- .04*(.2-abs(i))        # Noise level relative to `i`

# Final sample
x <- cbind(i, abs(.5*i)) + noise*matrix(rnorm(sum(n)*2), sum(n), 2)

plot(x[,1], x[,2], col=cls)

enter image description here

Andere Tipps

Is there any reason to generate this very particular type of data? Any results drawn from this will likely not generalize to other datasets.

Anyway, the obvious way to generate this kind of data is to use a nonlinear projection, e.g. using the famous "abs" function (absolute value).

i.e. project x to (in python syntax, I don't like R): math.abs(x) or if you want some extra randomness: math.abs(x + random.random(.1)) + random.random(.1)

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top