I'd be grateful if people could help me find an efficient way (probably low memory algorithm) to tackle the following problem.
I need to find the stationary distribution x
of a transition matrix P
. The transition matrix is an extremely large, extremely sparse matrix, constructed such that all the columns sum to 1. Since the stationary distribution is given by the equation Px = x
, then x
is simply the eigenvector of P
associated with eigenvalue 1.
I'm currently using GNU Octave to both generate the transition matrix, find the stationary distribution, and plot the results. I'm using the function eigs()
, which calculates both eigenvalues and eigenvectors, and it is possible to return just one eigenvector, where the eigenvalue is 1 (I actually had to specify 1.1, to prevent an error). Construction of the transition matrix (using a sparse matrix) is fairly quick, but finding the eigenvector gets increasingly slow as I increase the size, and I'm running out of memory before I can examine even moderately sized problems.
My current code is
[v l] = eigs(P, 1, 1.01);
x = v / sum(v);
Given that I know that 1 is the eigenvalue, I'm wondering if there is either a better method to calculate the eigenvector, or a way that makes more efficient use of memory, given that I don't really need an intermediate large dense matrix. I naively tried
n = size(P,1); % number of states
Q = P - speye(n,n);
x = Q\zeros(n,1); % solve (P-I)x = 0
which fails, since Q is singular (by definition).
I would be very grateful if anyone has any ideas on how I should approach this, as it's a calculation I have to perform a great number of times, and I'd like to try it on larger and more complex models if possible.
As background to this problem, I'm solving for the equilibrium distribution of the number of infectives in a cattle herd in a stochastic SIR model. Unfortunately the transition matrix is very large for even moderately sized herds. For example: in an SIR model with an average of 20 individuals (95% of the time the population is between 12 and 28 individuals), P
is 21169 by 21169 with 20340 non-zero values (i.e. 0.0005% dense), and uses up 321 Kb (a full matrix of that size would be 3.3 Gb), while for around 50 individuals P
uses 3 Mb. x
itself should be pretty small. I suspect that eigs()
has a dense matrix somewhere, which is causing me to run out of memory, so I should be okay if I can avoid using full matrices.