Question

Here is my sample data:

    Name       Value
1   Tom         4
2   Dave        2
3   Frank       3
4   Frank       1
5   Dave        1
6   Tom         1
7   Ri          4
8   Ri          5

and I need the above data to be in the following format

#   Tom   Dave  Frank    Ri
1    1      1     1    0
2    0      1     0    0
3    0      0     1    0
4    1      0     0    1 
5    0      0     0    1

and if get the code to the required data format.Please be sure that I will be using this code to my big_data which has 1048576 of rows and 2 columns.

Was it helpful?

Solution

This works:

all_names <- unique(df$Name)
num_cols  <- length(all_names)
num_rows  <- max(df$Value)

mat <- matrix(0L, num_rows, num_cols,
              dimnames = list(NULL, all_names))
mat[cbind(df$Value, match(df$Name, all_names))] <- 1L
mat
#      Tom Dave Frank Ri
# [1,]   1    1     1  0
# [2,]   0    1     0  0
# [3,]   0    0     1  0
# [4,]   1    0     0  1
# [5,]   0    0     0  1

My question about how sparse the matrix is was not innocent. If it is very sparse, you have interest to use a sparse matrix as it will use a lot less memory:

library(Matrix)
mat <- sparseMatrix(i = df$Value, j = match(df$Name, all_names), x = 1L,
                    dimnames = list(NULL, all_names))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top