Question

I am stumped trying to create an Emacs regular-expression that excludes groups. [^] excludes individual characters in a set, but I want to exclude specific sequences of characters: something like [^(not|this)], so that strings containing "not" or "this" are not matched.

In principle, I could write ([^n][^o][^t]|[^...]), but is there another way that's cleaner?

Was it helpful?

Solution

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

OTHER TIPS

This is not easily possible. Regular expressions are designed to match things, and this is all they can do.

First off: [^] does not designate an "excludes group", it designates a negated character class. Character classes do not support grouping in any form or shape. They support single characters (and, for convenience, character ranges). Your try [^(not|this)] is 100% equivalent to [^)(|hinots], as far as the regex engine is concerned.

Three ways can lead out of this situation:

  1. match (not|this) and exclude any matches with the help of the environment you are in (negate match results)
  2. use negative look-ahead, if supported by your regex engine and feasible in the situation
  3. rewrite the expression so it can match: see a similar question I asked earlier

Hard to believe that the accepted answer (from Gumbo) was actually accepted! Unless it was accepted because it indicated that you cannot do what you want. Unless you have a function that generates such regexps (as Gumbo shows), composing them would be a real pain.

What is the real use case -- what are you really trying to do?

As Tomalak indicated, (a) this is not what regexps do; (b) see the other post he linked to, for a good explanation, including what to do about your problem.

The answer is to use a regexp to match what you do not want, and then subtract that from the initial domain. IOW, do not try to make the regexp do the excluding (it cannot); do the excluding after using a regexp to match what you want to exclude.

This is how every tool that uses regexps works (e.g., grep): they offer a separate option (e.g. via syntax) that carries out the subtraction -- after matching what needs to be subtracted.

It sounds like you are trying to do negative lookahead. i.e. you are trying to stop matching once you reach some delimiter.

Emacs doesn't support lookahead directly, but it does support the non-greedy version of the *, +, and ? operators (*?, +?, ??), which can be used for the same purpose in most cases.

So for instance, to match the body of this javascript function:

bar = function (args) {
    if (blah) {
        foo();
    }
};

You can use this emacs regex:

function ([^)]+) {[[:ascii:]]+?};

Here we're stopping once we find the two element sequence "};". [[:ascii:]] is used instad of the "." operator because it works over multiple lines.

This is a little different than negative lookahead because the }; sequence itself it matched, however if your goal is to extract everything up until that point, you just use a capturing group \( and \).

See the emacs regex manual: http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html

As a side note, if you writing any kind of emacs regex, be sure to invoke M-x re-builder, which will bring up a little IDE for writing your regex against the current buffer.

Try M-x flush-lines.

For use case of matching a string for logical test, I do this:

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

I use this approach to avoid the bug of the function I discussed Over Here:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top