Question

Suppose I have 60 words and I want to check if an input is any of these words

Which is faster:
1) making a regex and ORing the words in it
2) looping on an array and searching?

Était-ce utile?

La solution

You can easily test this yourself. Out of curiosity, I created a test case of four different scenarios:

  1. Pattern.matcher().matches() with an on-demand Pattern instance (created for each run)
  2. Pattern.matcher().matches() with a cached Pattern instance (created before all runs)
  3. String.equals() for each element in the array, executed within a loop
  4. Set.contains() on a cached Set (created before all runs)

Data Set: Input array containing 6000 randomly generated strings of 6 characters each. Each test was executed 10,000 times, the results of all runs were totaled and averaged.

The results (all times in ms - lower is obviously better). The first number is the total execution time of all 10,000 runs, the second number is the average of each run:

On-Demand Regex:    12934 (1.29 avg)
Pre-compile Regex:    458 (0.05 avg)
Loop:                  77 (0.01 avg)
Set.contains:           4 (0.00 avg)

Long story short: if you're going to use a regular expression (which you shouldn't) at least create and cache the Pattern. But assuming performance is what matters, you're not going to beat Set.contains() if you know the list of words ahead of time.

Note The On-demand regex test includes the cost of constructing the StringBuilder instance that is given to the Pattern.compile() method, so not necessarily all of the extra time is spent in regex compilation. The Set.contains test also has a slight advantage in that it's inlined, and avoids the extra stack creation of the method call. I modified the test to have that execute inside of a separate method, but it didn't materially affect the results.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top