It depends on what type of differences you are trying to match. The fastest approach I know of is use shingle matching with minHash: http://www.stanford.edu/~ashishg/amdm/handouts/scribed-lec10.pdf http://en.wikipedia.org/wiki/MinHash
It is used to find near/exact duplicates, not partially similar documents.