You indeed need split
:
split(first.transactions.data, rep(1:3, each = 5))
(adjust numbers to suit your needs, maybe make them nrow
-dependent)
Question
I have a matrix (first.transactions.data) with two columns id and date and 12499 rows.
id date
1 19164958 2001-09-01
2 39244924 2001-11-01
3 39578413 2001-09-01
4 40992265 2001-11-01
5 43061957 2001-09-01
6 47196850 2001-11-01
7 51236987 2001-11-01
8 51326773 2001-09-01
9 54271247 2001-09-01
10 70765025 2001-09-01
11 70781923 2001-09-01
12 70782614 2001-09-01
13 70797166 2001-09-01
14 70992941 2001-09-01
15 70995813 2001-09-01
Now I want to write a function that can divide this matrix in equally long sub-matrices n. E.g with n = 3 a matrix 1/A that contains rows 1 to 5 a second matrix 2/B which contains rows 6 to 10 and a last matrix 3/C containing rows 11 to 15.
I've tried using split or cut but I encounter several problems with them. E.g.
sub <- split(first.transactions.data, cut(first.transactions.data$id, 10))
Results in:
$`(1.91e+07,2.61e+07]`
id date
1: 19164958 2001-09-01
$`(2.61e+07,3.3e+07]`
Empty data.table (0 rows) of 2 cols: id,date
$`(3.3e+07,4e+07]`
id date
1: 39244924 2001-11-01
2: 39578413 2001-09-01
$`(4e+07,4.7e+07]`
id date
1: 40992265 2001-11-01
2: 43061957 2001-09-01
or sub <- split(first.transactions.data, sample(rep(1:29, 431)))
yields:
$`1`
id date
1: 71189663 2001-09-01
2: 71307343 2001-09-01
3: 71361917 2001-09-01
4: 71410408 2001-09-01
5: 71518508 2001-09-01
---
427: 88698009 2002-01-01
428: 88698658 2002-01-01
429: 88700541 2002-01-01
430: 88700697 2002-01-01
431: 88701106 2002-01-01
$`2`
id date
1: 71172578 2001-09-01
2: 71608016 2001-09-01
3: 71647277 2001-09-01
4: 71834223 2001-09-01
5: 71998882 2001-09-01
---
427: 88702992 2002-01-01
428: 88703276 2002-01-01
429: 88703439 2002-01-01
430: 88704952 2002-01-01
431: 88705136 2002-01-01
The first command doesn't output equally long parts (I think its using quantiles and not number of observations). The second command seems to subset the matrix in random observations of the originating matrix. Additionally, I have to specify into how many parts I want to divide and how long the sub sets are going to be. Finally, I don't know how to access the content of each sub-matrix.
I want to create those sub-matrices to use them as cohorts. With the cohorts I later want to check in the full data set how many of the IDs are still alive in later periods to calculate the individual's retention rate by cohort.
Can I use the commands split and cut for this, do I need others or is my approach even infeasible in R?
Thank you very much for your time and help.
Patrik
PS: Sorry for my presentation of the matrix. I can't figure out how to edit it properly.
Solution
You indeed need split
:
split(first.transactions.data, rep(1:3, each = 5))
(adjust numbers to suit your needs, maybe make them nrow
-dependent)