Escape path separator in a regular expression
Question
I need to write a regular expression that finds javascript files that match
<anypath><slash>js<slash><anything>.js
For example, it should work for both :
- c:\mysite\js\common.js (Windows)
- /var/www/mysite/js/common.js (UNIX)
The problem is that the file separator in Windows is not being properly escaped :
pattern = Pattern.compile(
"^(.+?)" +
File.separator +
"js" +
File.separator +
"(.+?).js$" );
Throwing
java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence
Is there any way to use a common regular expression that works in both Windows and UNIX systems ?
Solution
Does Pattern.quote(File.separator)
do the trick?
EDIT: This is available as of Java 1.5 or later. For 1.4, you need to simply escape the file separator char:
"\\" + File.separator
Escaping punctuation characters will not break anything, but escaping letters or numbers unconditionally will either change them to their special meaning or lead to a PatternSyntaxException. (Thanks Alan M for pointing this out in the comments!)
OTHER TIPS
Is there any way to use a common regular expression that works in both Windows and UNIX systems ?
Yes, just use a regex that matches both kinds of separator.
pattern = Pattern.compile(
"^(.+?)" +
"[/\\\\]" +
"js" +
"[/\\\\]" +
"(.+?)\\.js$" );
It's safe because neither Windows nor Unix permits those characters in a file or directory name.
Can't you just use a backslash to escape the path separator like so:
pattern = Pattern.compile(
"^(.+?)\\" +
File.separator +
"js\\" +
File.separator +
"(.+?).js$" );
Why don't you escape File.separator
:
... +
"\\" + File.separator +
...
to fit Pattern.compile
requirements?
I hope "/" (unix case) is processed as a single "/".
I've tested gimel's answer on a Unix system - putting "\\" + File.separator
works fine - the resulting "\/"
in the pattern correctly matches a single "/"