Frage

Possible Duplicate:
Counting the swaps required to convert one permutation into another

I'm looking for an algorithm that would count some kind of string distance where only allowed operation is transposition of two adjacent characters. For example:
string1: "mother"
string2: "moterh"
distance: 2 (first swap "h" with "e" and get "motehr" and then "h" with "r" resulting in "moterh")
I know that Damerau–Levenshtein distance is quite alike that problem however it requires a lot of memory (I'd like it to work quite fast on words up to 1 000 000 characters). I've already written this:

int amo = 0;

for (int i = 0; i < n; i++)
{
    if (fromString[i] == toString[i])
        continue;
    char toWhat = toString[i];
    int where = -1;
    for (int j = i; j < n; j++)
    {
        if (fromString[j] == toWhat)
        {
            where = j;
            break;
        }
    }
    while (where != i)
    {
        char temp = fromString[where];
        fromString[where] = fromString[where - 1];
        fromString[where - 1] = temp;
        where--;
        amo++;
    }
}
cout << amo << endl;`

Strings are represented as char[n] where n is their length. I'm quite sure there's a way to do it faster and I'd be very thankful if somebody will tell me how to do it or write some source code (best would be Java/Python/C++ but anything's be great).

P.S. Excuse me any language mistakes, I'm not English and I haven't mastered that language yet.

War es hilfreich?

Lösung

Basically you're asking for the edit distance algorithm, but only allowing the transposition (a.k.a. swapping, twiddling) operation. In the book "Introduction to Algorithms" you'll find clues for implementing the twiddle operation, it's one of the problems at the end of the chapter on dynamic programming. Also, in the book "The Algorithm Design Manual", in the chapter on dynamic programming, there's a complete implementation of the edit distance algorithm in C - sans the transposition operation (again, it's one of the proposed exercises at the end of the chapter).

In the above link, you'll find that the typical way to implement the edit distance algorithm is by using dynamic programming, which has a cost of O(mn) time and O(mn) space. As far as I know, there's no way to do it faster (e.g. in less than O(mn) time), but surely you can do it in less space - being smart, you can reduce the space to O(m), given that only the current row and the two previous rows in the table are needed for calculating the cost of a transposition operation.

That is, assuming you only need the edit distance . If you need the actual edit operations, you're stuck using O(mn) space for reconstructing the solution if using dynamic programming. However, you can reduce the space to O(min{m,n}) and reconstruct the actual edit operations, by using Hirschberg's algorithm.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top