Question

I've a matrix (200x3) which i want to split into 3 random chosen disjoint sets. How can i realize it?

I tried to do it via sample method but sample method accepts just vectors and output is not really part of my matrix.

Thus, it is my matrix:

          X1           X2     Y
1   -3.381342627  1.037658397 0
2    3.329754336  1.964180648 0
3    1.760001645 -3.414310545 0
4   -2.450315854 -2.299838395 0
5   -3.334593596  0.069458604 0
6    1.708921101 -2.333932571 0
7   -2.650506645  0.348985289 0
8   -2.935307106 -0.402072990 0
9    2.867566309 -3.217712074 0
10   3.617603017  1.956535384 0

And i want to split in 3 sets like this: (row-numbers have to be random chosen). And i want to able to give the size of sets. For example in this case, 4 4 2.

9    2.867566309 -3.217712074 0
3    1.760001645 -3.414310545 0
1   -3.381342627  1.037658397 0
2    3.329754336  1.964180648 0


5   -3.334593596  0.069458604 0
8   -2.935307106 -0.402072990 0
4   -2.450315854 -2.299838395 0
6    1.708921101 -2.333932571 0


10   3.617603017  1.956535384 0
7   -2.650506645  0.348985289 0
Was it helpful?

Solution

Here is one way,

# a matrix with 3 columns
m <- matrix(runif(300), ncol=3)

# split into a list of dataframes (of course, you can convert back to matrices)
m_split <- split(as.data.frame(m), sample(1:3, size=nrow(m), replace=TRUE))

# count nr of rows
sapply(m_split, nrow)

# Or, as in the comment below, split by given number of rows per split
nsplit <- c(30,30,40)
m_split2 <- split(as.data.frame(m), rep(1:3, nsplit))

OTHER TIPS

I have solved it (may be not best way but solved) as follows:

nsamples= nrow(data)
//first take a random numbers; %40 of total number of samples
sampleInd = sample(nsamples,0.4*nsamples)
//construct first set via the half of taken indexes
valInd = sampleInd[1:floor(length(sampleInd)/2)]
valSet = dat[valInd,]
//other half
testInd = sampleInd[(floor(length(sampleInd)/2)+1):length(sampleInd)]
testSet = dat[testInd,]
//unused %60
trainSet = dat[-sampleInd,]
ntrain = nrow(trainSet)

Procents can be changed as you wish. The idea is thus dividing the matrix via function sample in terms of indices. Then using indices to take the actual matrices.

The idea I mentioned in the comments:

# shuffle rows
rows = sample(nrow(m))

# split any way you like, e.g. 4/4/rest
rows.split = split(rows, c(rep(1,4), rep(2,4), rep(3,nrow(m) - 4 - 4)))

# subset the matrix
lapply(rows.split, function(x) m[x,])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top