What is the fastest algorithm in terms of computation time for burst errors (block code) in embedded environment?

Question

I don't think there is any substitute for RS to correct block errors. Convolutional codes are quite fast (~15 amortized instructions per decoded bit for viterbi), but are only suitable for sporadic bit changes. Syndrome for BCH (binary) code (31,5) can be calculated even faster, but it has only capability of recovering 2 bits.

This page discusses the complexity of full software implementations of long RS decoding:

 RS(255,251) - 12 Mbps
 RS(255,239) - 2.7 Mbps
 RS(255,223) - 1.1 Mbps

Mapping 2-error syndrome to error locations achieved 12Mbps speed on 166MHz Pentium (using probably a look up table), which could be said to be 5-10% as efficient as todays typical embedded processors. The complexity of larger polynomials is about O(M*N), where M = block length, N=code length for the syndrome calculation and the Berlekamp-Massey algorithm. The Chien search seems to be often done brute force looping over all possible (255) roots and substituting them one by one to the error location polynomial -- that has also a complexity of about O(M*N), as M=block size = 255 is about the same as the number of codewords (256).