Javascript regex to match a pattern but NOT match a regex literal (r.js optimizer and uglify issue)?

https://stackoverflow.com/questions/14741958

07-03-2022
|

Frage

I've got a Backbone application, organized into modules using Require.js. One of these modules contains a Handlebars helper, which has a method I use to pull a legal header off of all our HTML templates for each View. The header is contained in an HTML comment, so I use the following regex to strip it off:

/<!--[\s\S]*?-->/g

Now, when I optimize (concatenate/compile/minify) the application using r.js, I'm doing the same removal of HTML comments using the onBuildWrite() method of r.js:

onBuildWrite: function (moduleName, path, contents) {
    var htmlCommentRegex = /<!--[\s\S]*?-->/g;

    return contents.replace(htmlCommentRegex, "");
},

Now, unfortunately, this means that when the Require.js module containing the Handlebars helper is pulled into the r.js optimized build, the regex literal within the helper is stripped out, causing my entire r.js build to bomb out.

I've tried to resolve the issue by selectively applying the regex in onBuildWrite() to all modules EXCEPT the helper:

onBuildWrite: function (moduleName, path, contents) {
    var htmlCommentRegex = /<!--[\s\S]*?-->/g;

    if (moduleName !== "helpers/handlebars.compileClean") {
        contents = contents.replace(htmlCommentRegex, "");
    }

    return contents;
},

But this doesn't appear to work when uglification is enabled in the r.js configuration; the regex STILL seems to be running on the entire built script, including the helper, causing the build to bomb out.

If uglify is disabled in the r.js config, everything works fine.

Anyone have any ideas why uglify would break this? Would switching to a different regex, that would capture HTML comments but IGNORE the HTML comment regex literal, solve the issue? If so, what would that regex look like?

Lösung

Change your regexp to:

var htmlCommentRegex = /[<]!--[\s\S]*?-->/g;

The single-character [<] class is equivalent to < as far as the RE processor is concerned, but now the RE no longer matches itself.

Another way is to escape one of the literal characters in the RE:

var htmlCommentRegex = /<\!--[\s\S]*?-->/g;

Or you could build the RE from strings:

var htmlCommentRegex = new RegExp('<!'+'--[\s\S]*?-->', 'g');

If r.js is optimizing all these back to the original text, try this:

var commentPrefix = '<!';
var htmlCommentRegex = new Regexp(commentPrefix+'--[\s\S]*?-->', 'g');

Hopefully it doesn't do enough code analysis to undo this obfuscation.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow