Folding: How can an algorithm run on/over multiple computers/networks finish faster than a supercomputer?

Question

Simple answer to your title - it can't. As long as the supercomputer and the independent computers have a similar raw computing power, the interconnection of the supercomputer will be by a factor of 40 - 400 faster.

Now assuming that you won't get a whole supercomputer to make your calculations (as it often is the case in academia at least) and only 10 nodes instead. But you know how to parallelize the algorithm to run on 100 nodes, commodity servers or rented VMs from Amazon will be probably faster.

The clue here is, as you pointed out, to optimize for the network traffic. This starts with simply compressing the data sent around (GZip compression) and ends by sending small task-definitions which need a lot of computation and can in return produce only a short answer.

For instance you could send a matrix or part of a matrix, and the task is to find a permutation of that matrix to fit a certain condition. Now the matrix is of size n (the data sent could be even compressed to make it smaller) but the computation will take in worst case n!.

Being able to chop the problem into little peaces like this allows SETI@Home to reach speeds of 600 teraFLOPS on average. (source Wikipedia - FLOPS). On the other hand, a Supercomputer with that computing power would cost over 10 Million USD).

To clarify, I don't know how SETI@Home is working, I provided the matrix permutation only as an example of little data sent, but a long computation is necessary.