Edit distance explanation

Question 1

The matrix contains [at the end] the edit distances between all the prefixes of word1 and all the prefixes of word2.

d[i][j] = edit distance between word1[0..(i-1)] and word2[0..(j-1)]

You are interested in d[l1][l2]. In general, to compute d[i][j], you need to look at the three smaller neighbours d[i-1][j], d[i-1][j-1] and d[i][j-1]. So transitively, d[i][j] requires all entries where at least one coordinate is smaller than i respj (and the other not larger). The exception is when two characters word1[i-1] and word2[j-1] are equal, in which case you only need the diagonal smaller neighbour.

If you first fill the matrix with -1 to indicate the corresponding edit distance between the prefixes has not been evaluated, and recursively calculate d[l1][l2], using the cached value of a needed d[i][j] if it has already been computed, computing it recursively and storing its value if not, some regions of the matrix may remain untouched. Possibly large regions if there are many pairs of equal characters [only the diagonal will be evaluated if the two words are equal], only small regions if any if there are few pairs of equal characters.

In the generic case, most of the matrix is needed to compute d[l1][l2], so then it is faster to compute the matrix completely using the simple algorithm than to recur and compute only the actually required values.

If you don't store the values for the shorter prefixes, since they are transitively required to compute d[i][j], they would have to be recalculated for each way d[i-a][j-b] is reached from d[i][j]. Since d[i-a][j-b] can be reached in many ways from d[i][j], that would cause a lot of recalculation leading to a grossly inefficient algorithm in general.

Since the computation for each row only uses the previous row, you could do with just two arrays of length min{l1, l2} + 1 to save some memory, but unless the words are really long, it doesn't make a big difference, and the code is simpler with the full array.

Question 2

Your code is calculating the Levenshtein distance using dynamic programming.

The array d will ultimately contain the solution to various sub-problems, where d[i][j] is the distance between the first i letters of the first word and the first j letters of the second. There is a relationship between d[i][j] and the entries d[i-1][j], d[i][j-1] and d[i-1][j-1]. The algorithm calculates the entries of the table in such a way that required sub-problems have already been calculated (this is the dynamic programming part).

Question 3

The code could be difficult to understand but here are some tips:

Edit distance between any pair of characters in two string is at least edit distance of all the character pairs which has been compared before them. e.g when comparing two string abc and ade when you reach at stage comparing c and e you are sure that the edit distance of the two strings is at least going to be the minimum edit distance found so far (1 in this case)
If two characters being compared are not equal, edit distance is going to increase by one, signifying replacement. So if you know the edit distance of the string before the comparison, you can add 1 to it depending upon whether the characters are equal or not.

Armed with these two facts,

position[i, j] is the edit distance for consideration

position[i-1,j] would be the total edit distance of the string so far if ith row element did not exist (signify insertion cost)

position[i,j-1] would be the total edit distance of the string so far if jth column element was removed (signify deletion cost)

position[i-i,j-1] would be the min edit distance calculated so far for all previous elements

position[i,j] is then calculated by taking minimum of all three possibility and adding current decision.