Question

I'm trying to learn how to write emacs major-modes. There are lots of great tutorials online (e.g. http://www.emacswiki.org/emacs/GenericMode), but I'm struggling to learn the syntax for regex matching. For example, from this answer I'm trying to understand why

'(("\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\""

from

(define-derived-mode rich-text-mode text-mode "Rich Text"
  "text mode with string highlighting."

  ;;register keywords
  (setq rich-text-font-lock-keywords
        '(("\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\"" 0 font-lock-string-face)))
  (setq font-lock-defaults rich-text-font-lock-keywords)
  (font-lock-mode 1))

matches anything between double quotation marks. This material: http://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Special.html#Regexp-Special doesn't seem to explain that.

Are there any better resources out there?

Was it helpful?

Solution

An answer to your question of what the regexp does --- The regexp in the example you cite is actually "\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\"".

The parts to match are:

  • \", which matches only a " char --- this is at the beginning and the end of the regexp.

  • A group, which contains \\(?:.\\|\n\\)*? followed by [^\\]. The group is presumably there so that font-lock-keywords can be told to do something with that part of a match, i.e., the part between the matching " at the beginning and end.

  • \\(?:.\\|\n\\)*?, the first part of the group, matches zero or more characters --- any characters. The *? could be just * (same thing). The . matches any char except a newline char, and the \n matches a newline char. The \\| means either of those is OK.

  • [^\\] matches any character except a backslash (\).

So putting it together, the group matches zero or more chars followed by a char that is not a backslash. Why not just use a regexp that matches zero or more chars between " chars? Presumably because the person wanted to make sure the ending " was not escaped (by a backslash). However, note that the regexp requires there to be at least one char between the " chars, so that regexp does not match the empty string, "".

A good resource is: http://www.emacswiki.org/emacs/RegularExpression.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top