Question

Is there a way to rewrite this regex expression such that it does not include a lookahead for "/js"?

Is this even something that I should worry about in terms of performance? It is being used to filter HTTP requests.

\.(asmx(?!/js)|aspx|htm)

Edit: To be clear: I'd like to specifically prevent ".asmx/js" but allow all other .asmx requests through.

BAD: Portal.asmx/js
GOOD: Portal.asmx/UpdateProduct
Was it helpful?

Solution

If you want to block Portal.asmx/js but allow Portal.asmx/UpdateProduct there are two ways to handle it - a whitelist pattern listing all the accepted values, or a negative lookahead for the unwanted matches.

A negative lookahead is almost certainly going be better performance than listing all the acceptable values.

However, simply using your existing expression will not match exactly what you want. It would block, for example, Portal.asmx/json and allow Portal.asmx/js.aspx - which might not be likely URLs, but simply highlight what needs fixing.

This expression (copied from eyelidlessness answer) will handle things appropriately:

\.(asmx(?!/js[/\z])|aspx$|html?$)


It's worth explaining that the [/\z] character class will match either / or <end of string> - the \z is the same as to $ but works in character classes (where the $ would match a literal $ character).
(There are differences between $ and \z but only in multiline mode, which isn't relevant for URL filtering).


In general, don't worry about performance unless you've got a measurable performance problem (otherwise how will you know if what you've changed made any difference).

OTHER TIPS

Don't worry about performance of such a simple lookahead. Your regex is fine.

Edit: But it may catch false positives (eg Portal.asmx/jssomething), you might try something like:

\.(asmx(?!/js[/\z])|aspx$|html?$)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top