Question

I have six string variables say str11, str12, str13, str21, str21 and str23.

I need to compare combination of these variables.

The combinations I have to check is str11 -- str12 -- str13 as one group and str21 -- str22 -- str23 as other group. I have to compare these two groups.

Now I'm in confusion which method should I use for comparison?

Can I append strings of same group and compare, which is only one comparison say ( str11 append str12 append str13 ) eqauls ( str21 append str22 append str23 )

Or

Should I go for individual 3 comparisons?

if( str11 equals str21 ) {

    if( str12 equals str22 ) {

        if( str13 equals str23 ) {

        }

    }

}

What is performance factor which costs me because of string length when I do string comparison? Lets us assume all strings are of same( approx ) length.

Was it helpful?

Solution

I’d test individually.

Is “AB” “CD” “EF” equal to “ABC” “DE” “F”?

Me thinks not.

P.S. If it is, then it’s a VERY special case, and if you decide to code it that way (as a concatenated comparison) then comment the hell out of it.

OTHER TIPS

Splitting the comparison into three if statements is definitely not necessary. You could also simply do an AND with your comparisons, eg

if (  str11 equals str21
   && str12 equals str22
   && str13 equals str23) ...

Your variable names indicate a major code smell. It sounds like instead of having six variables, you should instead have two arrays, each containing three strings. In other words, something like this initially would have been much better:

String[][] strs = new String[2][3];
strs[0][0] = str11;
strs[0][1] = str12;
...

Chances are that depending where you obtained the six strings from, you will not need to do this manually immediately before the comparison, but can likely pass in your arguments in a format that is more friendly.

If you do wish to do this by comparing arrays of the string objects, and you are using Java 1.5 or above, remember that you have access to the java.util.Arrays.equals() methods for array equality. Using library methods as much as possible is a great way to avoid extra effort reinventing the wheel, and possible implementation mistakes (both submitted implementations so far have bugs, for example).

The exact route you take probably depends on the domain you are writing for - if your particular problem requires you to always compare 3-tuples, then writing the code to explictly compare groups of three strings would not be such a good idea, as it would probably be more immediately understandable than code that compared arrays of arbitrary length. (If you are going this route, then by all means us a single if() conditional with && instead of nested if blocks, as Adam Bellaire demonstrated).

In general though, you'll have a much more reuable block of code if you set it up to work with arrays of arbitrary length.

Appending the strings together and comparing will not work. For instance, strings 1 and 2 could be empty and string 3 could contain "gorps", while string 4 contains "gorps" and 5 and 6 are empty. A comparison of the appended results would return true, though that would be a false positive. You would have to come up with a delimiter you guarantee would not be contained in any string to get this to work, and that could get messy.

I would just do the comparison the way you are doing it. It's readable and straightforward.

The iteration over over one large char[] is probably faster than iteration over n separate string of a total equal length. This is because data is very local and the CPU has an easy time to prefetch data.

However, when you concatenate multiple strings in Java you will use StringBuilder/Buffer and then convert i back to a String in several cases. This will cause increased memory allocation due to how SB.append() works and Java String being immutable, which in turn can create a memory bottleneck and slow down your application significantly.

I would recommend keeping the Strings as is and do separate comparison. The gain in performance due to a longer char[] most likely is far less than the problems you can run in to with the higher allocation rate.

With all respect: I think your code and question not only smells a bit, but almost stinks (big smiley here).

1) the variable names indicate actually having string-vectors around; as already mentioned
2) the question of individual compares vs. a concatenated compare raises the question of how you define equality of your string-tuples; also already mentioned.

But what strikes me most:

3) To me that looks like a typical case of "premature optimization" and counting CPU cycles at the wrong place.

If you really care for the performance, forget about the cost of 3 individual compares against a single compare. Instead:

How about the added overhead of creating two concatenated strings ?

  (str11 + str12 + str13) = (str21 + str22 + str23)

Lets analyze that w.r.t. to the memory manager and operations to be done. On the low level, that translates is 4 additional memory allocations, 2 additional strcpy's, and either another 4 additional strcat or strcpy (depending on how the VM does it; but most would use another strcpy) operations. Then a single compare is called for, which does not first count the characters using strlen; instead it either knows the size in advance (if the object header also includes the number of chars, which is likely) or it simply runs up to a 0-byte. That is called once vs. 3 times. The actual number of chars to compare is roughly the same (forget about the extra 0-bytes). That leaves us with 2 additional calls to strcmp (a few nS), vs. the overhead I described above (a few uS). If we add up the GC reclamation overhead (0 allocations vs. 4), I'd say that your "optimized" solution can easily be a 100 to 1000 times slower than the 3 strcmps !

Additional Notice:
Theroretically, the JITter could optimize it or some of it, and actually generate code as suggested by Adam Bellaire, but I doubt that any JIT-developer cares to optimize such code. By the way, the system's string routines (aka String operations) are usually MUCH faster than handcoding, so do not start to loop over individual characters yourself.

I would add the two groups in two arrays, and then loop over the arrays to compare the individual strings in that array. A good example is already in the ansewers, given by Markus Lausberg.

I would not be concerned about performance costs. Just write it in the most readable way possible. The Java compiler is very good in performance optimizations.

Example method:

    public boolean compareGroups(String[] group1, String[] group2){
    if (group1.length != group2.length ){
        return false;
    }

    for (int i = 0; i < group1.length; i++) {
        if (!group1[i].equals(group2[i])){
            return false;
        }
    }

    return true;
}

And calling the method is ofcourse simple:

        String[] group1 = new String[]{"String 1", "String 2", "String 3"};
    String[] group2 = new String[]{"String 1", "String 2", "String 3"};

    boolean result = compareGroups(group1, group2);

i would use the simple way

dynamic run over all array elements of both arrays.

            boolean isEqual = true;
            for(int n = 0;n<str1.length;++n){
                isEqual &= str1[n].equals(str2[n]);
            }

            return isEqual;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top