Question

I have a matrix (first.transactions.data) with two columns id and date and 12499 rows.

    id  date
1   19164958    2001-09-01
2   39244924    2001-11-01
3   39578413    2001-09-01
4   40992265    2001-11-01
5   43061957    2001-09-01
6   47196850    2001-11-01
7   51236987    2001-11-01
8   51326773    2001-09-01
9   54271247    2001-09-01
10  70765025    2001-09-01
11  70781923    2001-09-01
12  70782614    2001-09-01
13  70797166    2001-09-01
14  70992941    2001-09-01
15  70995813    2001-09-01

Now I want to write a function that can divide this matrix in equally long sub-matrices n. E.g with n = 3 a matrix 1/A that contains rows 1 to 5 a second matrix 2/B which contains rows 6 to 10 and a last matrix 3/C containing rows 11 to 15.

I've tried using split or cut but I encounter several problems with them. E.g.

sub <- split(first.transactions.data, cut(first.transactions.data$id, 10))

Results in:

$`(1.91e+07,2.61e+07]`
     id       date
1: 19164958 2001-09-01

$`(2.61e+07,3.3e+07]`
Empty data.table (0 rows) of 2 cols: id,date

$`(3.3e+07,4e+07]`
         id       date
1: 39244924 2001-11-01
2: 39578413 2001-09-01

$`(4e+07,4.7e+07]`
         id       date
1: 40992265 2001-11-01
2: 43061957 2001-09-01

or sub <- split(first.transactions.data, sample(rep(1:29, 431)))

yields:

    $`1`
           id       date
  1: 71189663 2001-09-01
  2: 71307343 2001-09-01
  3: 71361917 2001-09-01
  4: 71410408 2001-09-01
  5: 71518508 2001-09-01
 ---                    
427: 88698009 2002-01-01
428: 88698658 2002-01-01
429: 88700541 2002-01-01
430: 88700697 2002-01-01
431: 88701106 2002-01-01

$`2`
           id       date
  1: 71172578 2001-09-01
  2: 71608016 2001-09-01
  3: 71647277 2001-09-01
  4: 71834223 2001-09-01
  5: 71998882 2001-09-01
 ---                    
427: 88702992 2002-01-01
428: 88703276 2002-01-01
429: 88703439 2002-01-01
430: 88704952 2002-01-01
431: 88705136 2002-01-01

The first command doesn't output equally long parts (I think its using quantiles and not number of observations). The second command seems to subset the matrix in random observations of the originating matrix. Additionally, I have to specify into how many parts I want to divide and how long the sub sets are going to be. Finally, I don't know how to access the content of each sub-matrix.

I want to create those sub-matrices to use them as cohorts. With the cohorts I later want to check in the full data set how many of the IDs are still alive in later periods to calculate the individual's retention rate by cohort.

Can I use the commands split and cut for this, do I need others or is my approach even infeasible in R?

Thank you very much for your time and help.

Patrik

PS: Sorry for my presentation of the matrix. I can't figure out how to edit it properly.

Was it helpful?

Solution

You indeed need split:

split(first.transactions.data, rep(1:3, each = 5))

(adjust numbers to suit your needs, maybe make them nrow-dependent)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top