Question

Suppose I have a very long string str = "abcdef...".

I want to replace a possible-prefix of "xyz" in it, using str.replaceFirst("^xyz","").

What is the expected running-time of the above expression?

Will replaceFirst return immediately after the first character, or iterate throughout the entire string?

P.S.: I need an answer from someone who knows how the Java byte-code interpreter works behind the scene when compiling (or shall I say 'interpreting') the replaceFirst method of the String class.

UPDATE:

Please consider my question regardless of the intermediate string created in order to return the result. This will happen in any case (no matter what String method I use in order to accomplish the desired conversion). I therefore removed the part which refers to the complexity, as it is O(n) in any case...

So to summarize the question:

Can I assume that the running-time of the regex-match itself does not depend on the length of str?

Was it helpful?

Solution

replaceFirst creates a new string by using a StringBuffer on my version of the jdk. So, assuming the length of the regex and the replacement strings are small, it is O(n) due to the string copy, with n the length of the initial:

public String replaceFirst(String replacement) {
    if (replacement == null)
        throw new NullPointerException("replacement");
    reset();
    if (!find())
        return text.toString();
    StringBuffer sb = new StringBuffer();
    appendReplacement(sb, replacement);
    appendTail(sb);
    return sb.toString();
}

Notes:

  • The regex matching in itself should be O(1) though.
  • if there is no match, then replaceFirst goes to return text.toString() which returns the string itself: if there is no match, it is an O(1) operation.
  • startsWith + substring used to be O(1) (when substring was a view on the string, i.e. until Java 7u5) but is now O(n) too (since Java 7y6 substring creates a new string by copying the underlying char array).

Update

I have quickly tested the performance of the operation to confirm the above by matching an empty string and a long string with a regex using a ^ anchor and one without it. The result (in nanosecond per call):

p1s1 (empty string, "^x")   47.561 nsec/op
p1s2 (long string, "^x")    50.753 nsec/op
p2s1 (empty string, "x")    47.526 nsec/op
p2s2 (long string, "x")    131.015 nsec/op

So you can see that the "^x" regex drops the analyses at the first character because the time is (almost) the same with an empty or long string.

Code, using jmh for benhcmarking:

private String s1 = "";
private String s2 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" +
                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" +
                    "x";
private static final Pattern p1 = Pattern.compile("^x");
private static final Pattern p2 = Pattern.compile("x");

@GenerateMicroBenchmark
public boolean p1s1() { return p1.matcher(s1).find(); }

@GenerateMicroBenchmark
public boolean p1s2() { return p1.matcher(s2).find(); }

@GenerateMicroBenchmark
public boolean p2s1() { return p2.matcher(s1).find(); }

@GenerateMicroBenchmark
public boolean p2s2() { return p2.matcher(s2).find(); }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top