Question

I have a really big feed file with lot of coloumns. I will represennt one of the coloumns with a strings and i want to check these strings...

Let's see we have these string values (in a coloumn), the feed is obviously finctional:):

"Gia Joe Black Viper"
"Street Fighter...Ken"
"Mortal Kombat, Scorpion"
"Gia Joe Desert Fox"
"Mortal Kombat, Sub Zero"
"Street Fighter...Ryu"

I want to find the matches in the strings... So to simplify the task is: to find one of the string substring in an another string and collect these substrings in to a HashSet...

So basicaly the result tags would be:

Gi Joe 
Mortal Kombat 
Street Fighter

I write a simple code to test the algorithm, but i want to minimalize the time complexity of this task, space complexity is not as much important as time... (You can think a feed like 10.000 line about, so it is cardinal to have low time complexity) you can find and read below my code:

    String[] stringArray = new String[6];
        stringArray[0] = "Mortal Kombat - Scorpion";
        stringArray[1] = "Street Fighter - Ken";
        stringArray[2] = "Mortal Kombat - Scorpion";
        stringArray[3] = "Gi Joe - Desert Fox";
        stringArray[4] = "Gi Joe - Desert Dog";
        stringArray[5] = "Street Fighter - Ryu";

        HashSet<String> commonStrings = new HashSet();

        for (int i = 0; i < stringArray.length; i++) {
            String[] splittedString = stringArray[i].split("[ ]");
            System.out.println("i"+i);
            for (int j = 0; j < stringArray.length; j++) {
                System.out.println("j"+j);
                String matchable = "";
                for (int k = 0; k < splittedString.length; k++) {
                    System.out.println("k"+k);
                    if(k==0)matchable=matchable;
                    else {matchable = matchable + " " + splittedString[k];}
                    if(j!=i){
                        System.out.println("StringArray["+j+"]("+stringArray[j]+")index.of("+matchable+")"+"is"+matchable.indexOf(stringArray[j]));
                        if (stringArray[j].indexOf(matchable) > 0) {
                            commonStrings.add(matchable);
                        }
                    }
                }
            }

Any suggestion appreciated to make my code better, thank you!

Was it helpful?

Solution

Your complexity is quadratic, it can be O(n) by using hashmaps like this:

Map<String, Integer> cout = new HashMap<String, Integer>();

for (String line : StringArray) {
  for (String s : line.split("-")) {
     Integer currentCount = counts.get(s);
     if (currentCount == null)
       counts.put(s, 1);
     else
       counts.put(s, currentCount + 1);
  }
}
//Look in currentCount all keys with a value larger than 1.

This can still be optimized (but will not reduce complexity) by improving the else statement ;).

OTHER TIPS

You can split and sort words, than iterate over such sorted list. Result should be the same. Of course this is the solution only for whole words check. Instead of sorting you can use some dedicated data structure .

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top