Question

I implement typing trainer and would like to create my special String startsWith() method with specific rules. For example: '-' char should be equal to any long hyphen ('‒', etc). Also I'll add other rules for special accent characters (e equals é, but not é equals e).

public class TestCustomStartsWith {
    private static Map<Character, List<Character>> identityMap = new HashMap<>();
    static { // different hyphens: ‒, –, —, ―
        List<Character> list = new LinkedList<>();
        list.add('‒');
        list.add('–'); // etc
        identityMap.put('-', list);
    }

    public static void main(String[] args) {
        System.out.println(startsWith("‒d--", "-"));
    }

    public static boolean startsWith(String s, String prefix) {
        if (s.startsWith(prefix)) return true;
        if (prefix.length() > s.length()) return false;
        int i = prefix.length();
        while (--i >= 0) {
            if (prefix.charAt(i) != s.charAt(i)) {
                List<Character> list = identityMap.get(prefix.charAt(i));
                if ((list == null) || (!list.contains(s.charAt(i)))) return false;
            }
        }
        return true;
    }
}

I could just replace all kinds of long hyphens with '-' char, but if there will be more rules, I'm afraid replacing will be too slow.


How can I improve this algorithm?

Was it helpful?

Solution

I don't know all of your custom rules, but would a regular expression work?

The user is passing in a String. Create a method to convert that String to a regex, e.g.

  1. replace a short hyphen with short or long ([-‒]),
  2. same for your accents, e becomes [eé]
  3. Prepend with the start of word dohicky (\b),

Then convert this to a regex and give it a go.

Note that the list of replacements could be kept in a Map as suggested by Tobbias. Your code could be something like

public boolean myStartsWith(String testString, String startsWith) {

    for (Map.Entry<String,String> me : fancyTransformMap) {
       startsWith = startsWith.replaceAll(me.getKey(), me.getValue());
    }

    return testString.matches('\b' + startsWith);
}

p.s. I'm not a regex super-guru so if there may be possible improvements.

OTHER TIPS

I'd think something like a HashMap that maps the undesirable characters to what you want them to be interpreted as might be the way to go if you are worried about performance;

HashMap<Character, Character> fastMap = new Map<Character, Character>();

// read it as '<long hyphen> can be interpreted as <regular-hyphen>
fastMap.add('–', '-');
fastMap.add('é', 'e');
fastMap.add('è', 'e');
fastMap.add('?', '?');
...
// and so on

That way you could ask for the value of the key: value = map.get(key).

  • However, this will only work as long as you have unique key-values. The caveat is that é can't be interpreted as è with this method - all the keys must be unique. However, if you are worried about performance, this is an exceedingly fast way of doing it, since the lookup time for a HashMap is pretty close to being O(1). But as others on this page has written, premature optimization is often a bad idea - try implementing something that works first, and if at the end of it you find it is too slow, then optimize.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top