Question

I'm masochistically writing an open-source text editor for Mac and have finally reached the point at which I want to add syntax highlighting. I've been going back and forth on various solutions for the past few days, and I've finally decided to open the question to a wider audience.

Here are the options I see:

  • Define languages basically with a series of regex pattern matching (similar to how TextMate defines its languages)
  • Define languages with a formal grammar like BNF or PEG

Using regex pattern matching seems less than ideal as it cannot formally represent a language nearly as well as a formal grammar; however, some less formal languages will have a hard time fitting into BNF (i.e. Markdown -- though I know there's a great PEG implementation).

What are the performance tradeoffs for live syntax highlighting? What about flexibility for a wide range of languages?

If I go the BNF route, Todd Ditchendorf created the awesome ParseKit framework which would work nicely out-of-the-box. Anyone know of any anything similar for PEG's?

Was it helpful?

Solution

Unless you want to fight the battle of getting a full-context free (or worse, a full context-sensitive) grammar completely correct for every language you want to process (or worse, for every dialect of the language you want to process... how many kinds of C++ are there?), for the purposes of syntax highlighting you're probably better giving up on complete correctness and accept that sometimes you'll get it wrong. In that case, regexps seem like an extremely good answer. They can also be very fast, so they won't interfere with the person doing the editing.

If you insist on doing full syntax checking/completion (I don't think you are), then you'll need that full grammar. You'll also be a very long time in producing editors for real languages.

Sometimes it is better not to be too serious. A 98% solution that you can get is better than a 100% solution that never materializes.

OTHER TIPS

It might not be exactly what you need since you are writing the editor yourself, but there is an awesome framework called Xtext that will actually generate a complete editor with syntax coloring, customizable outline view and auto-completion etc., based on a grammar for your language: http://eclipse.org/Xtext

In addition to the problems of getting a grammar to work for a language, there is the added complexity of trying to get it to work for code that is in the middle of being edited.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top