Domanda

How can I create a sparse matrix from a list of dimension names?

Suppose you have this matrix edgelist in a data frame:

  from to weight
1    4  a      1
2    5  b      2
3    6  c      3

It can be created like this:

from <- factor(c(4:6))
to <- c("a", "b", "c")
weight <- c(1:3)
foo <- data.frame(from, to, weight)

A matrix can be created by first creating an empty matrix filled with 0s, naming the rows and columns, and then filling the values in:

bar <- matrix(
  0,
  nrow = length(unique(foo$from)),
  ncol = length(unique(foo$to)),
  dimnames = list(levels(foo$from), levels(foo$to))
)
bar[as.matrix(foo[,1:2])] <- foo[,3]

The result looks like this:

  a b c
4 1 0 0
5 0 2 0
6 0 0 3

How can I create a sparse matrix?

Solution

An elegant way is to use the Matrix package which requires using the integer values of the factors:

bar_sparse <- sparseMatrix(
  i = as.numeric(foo$from),
  j = as.numeric(foo$to),
  x = foo$weight,
  dimnames = list(levels(foo$from), levels(foo$to))
)

Here we go:

  a b c
4 1 . .
5 . 2 .
6 . . 3

Thanks, Martin, for pointing me in this direction.

È stato utile?

Soluzione

As maintainer of the Matrix package: Using dimnames for sparseMatrix objects is allowed in construction, and for column names even of importance, notably e.g. for sparse model matrices (in glmnet etc). but for efficiency reasons (and partly lack of use cases and hence "not yet implemented") they are not always propagated, e.g., IIRC, in matrix multiplications.

The main reasons for this "semi discouraged" support is the fact that sparse Matrices are particularly important when very large in the sense of nrow(.) * ncol(.) being large. In such cases, carrying (and copying !!) hundreds of thousands of row (and column) names is costly.

After all this caveat, of course I acknowledge you've asked a well valid question, and you may not have a choice for now and indeed need to work with row and column names instead of integer indices.

Yes, you are (almost) right: Using

M <- Matrix(0, n,m, dimnames=....)
for(i in ...)
  for(j in ...)
        M[i,j] <- ...

is never a good idea for sparseMatrix objects (i.e. all Matrix objects inheriting from sparseMatrix). Rather, using sparseMatrix(...., dimnames = ..) .. by the way noting that using the dimnames argument is more efficient than setting colnames and rownames separately afterwards.

Altri suggerimenti

I presume that you know you can do something as simple as:

for (i in 1:nrow(foo)) bar[as.character(c(foo[i,1])),c(foo[i,2])] <- foo[i,3]

but if you want to get something more efficient to work with Matrix, you may need to write your own function to assign it. Something like:

  • convert from and to columns to factors, ordered in whatever way you want
  • Sort foo by from then to (if you can't guarantee this is already true) and remove duplicates
  • Create empty Matrix with correct dimensions
  • set foo@i to bar$from-1
  • set foo@p to bar$to-1 + length(colnames(bar)) * (bar$from-1)
  • set foo@x to bar$weight
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top