For similarity based on phrase co-occurrence (phrases appearing more often together in documents will be more similar), you can use gensim.
Check out the Latent Semantic Analysis and Latent Dirichlet Allocation there: http://radimrehurek.com/gensim/tut2.html#available-transformations
Depending on what exactly you want your clusters to do, you can either use the LSI/LDA topics directly as clusters. Or cluster the obtained latent phrase vectors etc.