You're using LIL matrices, which (unfortunately) have a linear-time insertion algorithm. Therefore, constructing them in this way takes quadratic time. Try a DOK matrix instead, those use hash tables for storage.
However, if you're interested in boolean term occurrences, then computing the co-occurrence matrix is much faster if you have a sparse term-document matrix. Let A
be such a matrix of shape (n_documents, n_terms)
, then the co-occurrence matrix is
A.T * A