Вопрос

Context:

When I learned about parsers, the process of compiling code (say, C++) was explained like this:

  1. Write code in a file and save it.
  2. Put the file in a compiler.
  3. The compiler first parses the code into an abstract syntax tree,
  4. then generates machine code.
  5. Run the code to test it.
  6. Repeat.

Bret Victor wanted a kind of programming environment that evaluates the code as you type. ( http://worrydream.com/#!/InventingOnPrinciple )

I guess he wasn't the first, that there might be some conceptual problems for translating that concept to general purpose programming beyond 2D game programming, and I know that there are some systems that already do something similar: E.g. preadsheets (like Excel), Smalltalk.

That's not what I want to discuss.


Question: (kind of broad, sorry - the main question is in bold)

How could parsing text while editing work? I had the idea, that whenever the editor sends an event indicating that some portion of text is changed, only a portion of the AST gets reevaluated and values that are affected by this portion of the AST also get reevaluated.

I thought about writing a parser generator that takes a grammar like usual, but produces a parser that processes incremental changes to a text, instead of a whole text.

1. Is this a reasonable concept? (For whatever obscure programming language/environment. Something "functional reactive" maybe. Or simply html.)

(2.) Is it maybe even used yet?

(3.) Is parsing the whole file fast enough to make the complicated approach unnecessary?

(4.) Do syntax highlighter or type-checker in IDEs such as Eclipse work like this? How do they work instead? I think they aren't as powerful as compiler-parsers to make them work fast enough, is that right?

(5.) Here in Stackoverflow there is a live preview of the styled text. Does it parse the whole question after every keystroke? Does it have some limitations that "my" concept would solve?

Это было полезно?

Решение

tab-completion (or "intellisense") requires something very similar to a parse in order to figure out what reasonable completions/follows might be. You probably have some experience with that in some IDE. If so, you will also have noticed some of its limitations.

Systems like SO's preview feature periodically parse the input, but not necessarily on every keystroke. You might notice that syntax highlighting lags a bit, particularly when the buffer is full. A typical strategy is to have a single process which repeatedly reparses until the input hasn't changed during the parse, and then waits for the next change.

Text editors like vim and emacs do reparse on every keystroke but they optimize by caching context at line endings (usually), so that the reparse is only on a few characters. (Of course, they don't do a full parse, so its even easier.)

There has been some research into incremental parsing and in-place editing of abstract syntax trees, but it turns to be quite tricky. One parsing strategy which lends itself naturally to this style is "packrat parsing" (an extensive bibliography is available here).

C++ is notoriously difficult to parse correctly. Indeed, it's non-trivial to figure out whether a given < is a template-bracket or a less-than sign; in general, you can't do that without reading all header files, and in some cases you can't figure it out without instantiating templates; that's far too much work to do interactively. Many other languages are easier to parse, and the simple solution of reparsing periodically will be fast enough for all practical purposes.

Hope that hits most of your questions.

Другие советы

This is absolutely an interesting question. I use a parser somewhat similar to what you describe in my GoWorks demonstration IDE. Here is a video which shows the parser in action (starts at the 5 minute mark):

Intro to Tunnel Vision Labs' GoWorks IDE (Preview Release 7)

The parser doesn't parse the entire file every time because the necessary parse information can be derived from only a subset of the input. Parsing more than this has several drawbacks, including the following major items.

  1. If you parse unnecessary parts of the file, and those parts contain syntax errors (common case when the user is actively editing), the parser may not be able to accurately recover from the error.
  2. Parsing more information than necessary, even when fast enough to appear "smooth" to the user, has a major impact on battery life for users on the go.

To my knowledge, our latest IDEs are the only ones actively using the particular style of parser they use (which is a very specific subset of what you describe designed for use in IDEs) as a general approach for multiple languages.

It is mostly done in one of two ways:

  • Use a much simpler raw parser that just contains enough information to colour. Fast enough because it is simple, might trip over complex language features.
  • Put position info in the AST, and just fix the tree. If you keep some history, you can avoid marking everything after the current mistake as wrong.

Smalltalk of course parses just a method at a time. That makes the solution faster and simpler, there is simply less to interpret wrongly. Reparsing while typing can be limited. As long as an identifier is not recognized, reparsing the part after it doesn't make much sense.

A long time ago there was the think pascal compiler for the mac (and also a c one) that had this working very well.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top