Why do we need to embed an interpreter in a program?

https://softwareengineering.stackexchange.com/questions/384900

18-02-2021
|

Pergunta

Emacs starts up as an editor (which probably has m functions that takes ninputs) and an Elisp interpreter running in the background (which can be used to change the behavior of the program - probably so much so that it is no longer emacs :-)).

Why do programs need an extension that is an interpreter? Is there any theory behind this? What fundamental feature does it provide, so that you can make a similar decision for your own project?

Assuming that this is how a (linux) program is in memory,

is it because without an interpreter (lying in the text segment), your program is just a finite machine that can execute a finite set of instructions in the text segment (real code a.k.a machine instructions) present in the program layout? However, when you add something like an interpreter, you can add new instructions (probably in the heap, because data and instruction, both are just bits?) and make it behave like an infinite machine?

I think it is the same as asking why do you need interpreter in the first place(!), but my question actually came from this specific scenario in Emacs like editors. So I would like to understand this from both perspectives.

Solução

your program is just a finite machine that can execute a finite set of instructions

Stricto sensu, this is true, but not very interesting. Indeed, my desktop has only three terabytes of memory (including RAM, disk, registers, etc...). So it has "only" 2^{(3*72057594037927936*8)} states, but viewing my desktop as such a huge finite machine is not very interesting. Observe that I can buy some more memory (e.g. add more disk).

Many physicists have a quantum-mechanical view of the entire universe, and might explain you that the universe itself is a huge finite state machine (after all, it has less that 10¹⁰⁰ particles, each of them having a quantum state related to Planck's constant, etc...)

In practice, better look at your laptop as a Turing machine and consider that its memory is potentially "infinite".

(your question is messing different levels of abstractions)

Why do programs need such an extension that is an interpreter all by itself?

Embedding an interpreter is related to partial evaluation and Turing completeness. In practice it is very convenient (but might facilitate malware, given as "data" interpreted by your extensible application). You could embed Lua or Guile in your application (but that is a major design choice, that you need to make very early because of its architectural implications, notably related to garbage collection). You could design your application to accept plugins (see dlopen(3) and dlsym(3); and my manydl.c program shows that it could have many hundred thousands of generated plugins). It might even use JIT compilation techniques (see libgccjit, LLVM, asmjit, ...)

Read SICP, then also Scott's Programming Language Pragmatics, Queinnec's Lisp In Small Pieces, and R & A Arpaci-Dusseau's Operating Systems: Three Easy Pieces and Pitrat's blog.

Play with SBCL (a good Common Lisp implementation). It is generating (dynamically) machine code at every REPL interaction. Be aware of homoiconicity and metaprogramming and multi-stage programming. Read about Greenspun's tenth rule and Gall's law and about accidentally turing-complete things.

Since Turing (and his halting problem, see also Rice's theorem) and Gödel (and his incompleteness theorem) we are aware that code is data (and is proof) and data is code (self-reference is a related concept, see also Richard's paradox and read about Curry-Howard correspondance). Read Hofstadter's Gödel, Escher, Bach book.

BTW, your picture of the virtual address space of some Linux process is really naive (it was sometimes true in the previous century; today things are much more complex). See proc(5), elf(5), execve(2), mmap(2), ld-linux(8) and try cat /proc/$$/maps, cat /proc/self/maps, cat /proc/$(pidof emacs)/maps on your Linux system.

See also this answer to a related question (and perhaps this one).

Outras dicas

Elisp is not embedded in Emacs. Emacs is written in Elisp. In some sense, Emacs is embedded in Elisp.

Emacs started out as a set of Editor MACroS for the TECO editor on ITS. It was then ported to Multics. However, the macro language of TECO is a terrible, terrible language, so the authors of Multics Emacs were looking for a better language to re-implement Emacs in. It turned out that the Multics port of MacLisp had much more performant function calls, and native text handling functionalities than the "standard" system programming language on Multics, PL/I. So, it was decided to implement Emacs in MacLisp.

Later, when Emacs was ported to Unix, they decided that it was easier to write a Lisp implementation and port only that Lisp implementation to new operating systems than always porting the entire Emacs system to a new Lisp implementation … especially since many of the systems they wanted to port Emacs to (like Unix) didn't even have a platform-native Lisp system as part of the OS, like earlier systems had.

Writing their own implementation also allowed them to adapt the language specifically for implementing a text editor.

Note that Emacs is really not special in this regard. Being able to control an application with a program is called "scripting" and there are many scripting languages and scripting hosts. You may have heard of a language called JavaScript that was originally created to script browsers and web servers. Lua is a language that is specifically designed as a scripting language and it is used in e.g. World of Warcraft and Adobe Lightroom. AutoCAD has its own scripting language called AutoLisp. Windows has PowerShell, Unix has the POSIX shell, macOS has AppleScript.

Because with a fully compiled tool, what it can do is limited to what the distributor thinks is useful.

Sure, in theory everyone can write their own extensions in C, recompile and happily use them. But in practice, compiling and reinstalling a tool takes time and effort, and is in fact nigh-impossible for most people who just get their software from a distributor or, worse, from an app store. So the difference between writing an extension function in Lisp and in C is the difference between some minutes of effort and hours, days or weeks of effort or even impossibility - nobody wants to wait until Apple has decided that a new version of a tool can be allowed into their sacred halls, just to use a labot-saving function that they need now.

And why do we need personalizable extension functions? We don't - I can program perfectly fine in most IDEs even though their Emacs "emulation" is ridiculously less powerful then the real thing. But I will be slower and much less happy doing it, and the one thing you don't want to do to expensive specialists like software developers is to make them less efficient. Learning ELisp is a challenge, but if you work with text in any serious way, it pays for itself literally a million times over.

Why do we need to embed an interpreter in a program?

Perhaps the simplest answer is that it makes it easy to extend the program to do things it wasn't originally designed to do, and without having to rebuild the program to do it.

There are countless examples. One of the most obvious is the web browser. It has an embedded javascript interpreter. Without it, all we would be able to see are static web pages. With it, however, developers around the world are able to do things that the original web browsers without an embedded language could never do.

I have been trying to add a Forth-like scripting language and REPL in my C# development environment (for a LOT of reasons), and I've been wrestling a lot with the difference between interpretation and execution. Even after being in the industry for many years, I did not expect to be able to switch back and forth between compilation and interpretation so quickly and so effortlessly. Anyway... I have realized that compilation is really translation from one form of code to another, usually textual code I've written, into MSIL, a bytcode for the .NET framework virtual machine. But if I do some Windows Forms development, and use the screen editor, I've created code using a GUI editor, and under the covers, it generates C#, or I might be remembering my FoxPro from long ago. It doesn't matter -- the point is that there are multiple representations of source code and multiple representations for the destination code -- and that's the compiling part.

But the compiled part is usually the "frozen" part, and can not change, can not be sculpted to fit the problem at hand. Especially not the way interpreted code can. So that's the first reason for interpretation -- to achieve flexibility at the point of use.

For the interpretation, anything that runs code is interpreting. So, clicking a button is interpreting your action of a mouse click, and running code off of that. So, every application is an interpreter, or more accurately, a whole suite of them. So, your question becomes, "Why do they embed an additional interpreter?" Changes the perspective quite a bit, doesn't it? And then if we start thinking about the XML files and YAML files being used as configuration files for an application, well that's yet another set of interpreters. Even when you compile down to machine instructions, those raw bytes are interpreted by the microcode inside of the chip.

Interpreters are everywhere.

The bible of software development that I found, Code Complete (second edition), states that we developers tend to write the same number of lines of code per day, so instead of writing assembly language, we should be writing in a DSL, a domain-specific language, created and targeted to be most effective at the job that needs to be done. For configuration, that's YAML, or a database table, or an *.ini file. For a game, that might be a LUA, embedded to script it. And we work hard to create a DSL to have the most expressiveness, to deal with just the abstractions at hand and focus on only those, and to be able to ignore everything else, to be able to try to limit complexity, and yet express my solutions to my problems as simply and as powerfully as I can. So that's another part of your answer. To enable and empower the user.

We embed an interpreter in order provide a vocabulary and a set of tools so that we can express the solution to the problem at hand in that dimension (or domain), with maximum ease, simplicity, and power. An embedded interpreter is a Domain-Specific Language, a set of tools, along with the words to use them. Why embed a DSL? To get it done faster, simpler, or to give us a NEW ability to solve problems that we could never solve before, and hopefully solve them correctly. And hopefully more succinctly than I have expressed this answer ;-)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange