Structuring a Compiler in a Dynamic Language (JavaScript)

https://softwareengineering.stackexchange.com/questions/418389

17-03-2021
|

Pregunta

For learning purposes, I'm trying to build a compiler in JavaScript for a tiny custom language and turn it into WASM. So far, I've got a lexer and parser, that turn my code into an AST, my question is about the next step: compiling/emitting WASM instructions from the AST.

I'm wondering, given I'm doing this in a dynamic language like JavaScript, what is the best approach to compiling from my AST. Most of the resources I can find online about building compilers revolve around using the Visitor pattern (or its reflective variant), which is great but I feel like, given JavaScript is a dynamic language, there might be a better approach that is more idiomatic or simpler yet still as powerful.

I somewhat fail to see the benefits of the visitor pattern in JavaScript, since I can just check for the type of each node at runtime. In that case, shouldn't I just be able to switch on the type and emit whatever instructions match the AST. I've seen an implementation do just that, but I'm unsure of how extensible that is in the long run. (I know this sounds like premature optimization, but I simply want to learn about the different approaches available).

Any thoughts would be appreciated!

Solución

Either approach works.

It really boils down to where you want your state control logic.

If you like the AST nodes to be opaque and self-aware objects (in the sense of true domain objects), then visitors are for you. This way the Node can give 0 away, but still appropriately hand information to the visitor. On the down side, the AST must provide a way to access/edit the information that preserves state. Not necessarily a trivial task and not necessarily at the right level when thinking about Graphs.

If you like the AST nodes to be dumber than door nails Plain Old Data, then using their type or a tag type is for you. This way the data structure can be read as appropriate for whatever usecase you envisage. On the downside every reader, and writer needs to be kept in sync as to what each value means.

In JavaScript you could go either way. But if you do go the second way, prefer tag types of direct types (aside from primitives) as the key benefit of the JavaScript model is duck-typing. That's a hard feature to keep around when you are keenly interested in an objects lineage.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange