سؤال

I'm doing some clustering research and need to generate synthetic data that would look something like these examples:

Dataset examples

We have 2d plots with 2 classes (red and black). How could I generate 2D data like this? It has a V structure, so I was thinking about generating points around straight lines - is there a way to do that in R? I'm using R, but am open to other tools (just data has to be exportable).

هل كانت مفيدة؟

المحلول

Here's a thought.

n <- c(200,200)                 # Number of points in each class
cls <- rep(1:2, n)              # Class memberships
i <- c(.2-.12*abs(rnorm(n[1])), # Noiseless x position
       -.2+.12*abs(rnorm(n[2])))
noise <- .04*(.2-abs(i))        # Noise level relative to `i`

# Final sample
x <- cbind(i, abs(.5*i)) + noise*matrix(rnorm(sum(n)*2), sum(n), 2)

plot(x[,1], x[,2], col=cls)

enter image description here

نصائح أخرى

Is there any reason to generate this very particular type of data? Any results drawn from this will likely not generalize to other datasets.

Anyway, the obvious way to generate this kind of data is to use a nonlinear projection, e.g. using the famous "abs" function (absolute value).

i.e. project x to (in python syntax, I don't like R): math.abs(x) or if you want some extra randomness: math.abs(x + random.random(.1)) + random.random(.1)

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top