Question

I'm working on a program that calculates a hit-percentage between two strings (A and B). To get an accurate percentage I'm matching N-Grams with a list of strings that are permutations of String A.

Here's my code

public String[] generatePermutations( String name ){

    String[] perms = new String[ calcN(name.length()) ];
    int nameLen = name.length(),
                cnt = 0;


    for(int i = 0; i < name.length(); i++ ){

        nameLen = name.length()-i;
        for( int ii = 0; ii <= i; ii++){
            perms[cnt++] = name.substring( ii, ii + nameLen );
        }
    }
    return perms;
}

for reference calcN() is below

public int calcN( int n ){
    return ( n * (n+1 )) / 2;
}

Given a String "ABC" this method will generate

{ "A", "B", "C", "AB", "BC", "ABC" }

Since I'm doing this operation thousands of times (perhaps hundreds of thousands) is there any way I can squeeze a few extra cycles out of my CPU? (besides switching to C++ or C). As always, thanks in advance for the advice!

Was it helpful?

Solution

The performance optimization of the method depends in part on the JVM in use. For example, in OpenJDK substring is implemented as:

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}

That string constructor is a protected form that is implemented as:

 // Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
     this.value = value;
     this.offset = offset;
     this.count = count;
 }

Note that this doesn't need to create a new value (the char[] that backs the String).

On the other hand, as described in Java 7 Performance Tuning Guide this was removed because of a memory leak (the single character substring of a 1000 character long string that gets garbage collected still retains the 1000 character string backing it).

And so, the choice of which jvm you are using could have a significant effect upon the creation of strings from substrings.

Depending on your application, and how critical that performance is, you might consider implementing your own limited version of the String that reimplements the offset and length implementation for substring.

OTHER TIPS

I know, I am not probably helping but I couldn't resist.

Scala one-liner:

(1 to 3).flatMap("ABC".combinations(_))

returns

Vector(A, B, C, AB, AC, BC, ABC)

and

"ABC".permutations.toList

returns

List(ABC, ACB, BAC, BCA, CAB, CBA)

It is very easy to use Scala on top of JVM.

It would not help much, but I can not see the use of cnt, which simply use ii

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top