Since HTML is in essence just a text-based markup, the easiest way to go is the Levenshtein distance. This algorithm determines the difference between 2 input strings by assigning a single point for every addition, subtraction or removal of a single character, and determines the 'shortest' distance for this result.
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other.
A sample implementation for Java can be found here.
By dividing the Levenshtein distance with the length of the largest input string you can calculate a difference percentage between the 2 strings.