Question

I'm interested in implementing a Forth system, just so I can get some experience building a simple VM and runtime.

When starting in Forth, one typically learns about the stack and its operators (DROP, DUP, SWAP, etc.) first, so it's natural to think of these as being among the primitive operators. But they're not. Each of them can be broken down into operators that directly manipulate memory and the stack pointers. Later one learns about store (!) and fetch (@) which can be used to implement DUP, SWAP, and so forth (ha!).

So what are the primitive operators? Which ones must be implemented directly in the runtime environment from which all others can be built? I'm not interested in high-performance; I want something that I (and others) can learn from. Operator optimization can come later.

(Yes, I'm aware that I can start with a Turing machine and go from there. That's a bit extreme.)

Edit: What I'm aiming for is akin to bootstrapping an operating system or a new compiler. What do I need do implement, at minimum, so that I can construct the rest of the system out of those primitive building blocks? I won't implement this on bare hardware; as an educational exercise, I'd write my own minimal VM.

Was it helpful?

Solution

This thread covers your exact question. Here is a soup-to-nuts implementation with complete documentation.

I wrote a subroutine threaded Forth targeting 68K when I was in college. I defined the runtime environment and dictionary format, then wrote some C code that boot strapped a Macintosh application that loaded a default dictionary, populated some I/O vectors and got the code running. Then I took the Leo Brodie book Starting Forth and started implementing the basic dictionary in 68K assembly language. I started with arithmetic/logic words, then did control structures then word definition/manipulation words. My understanding is that at a minimum you need @, !, +, -, * and /. The rest can be implemented in terms of those, but that's like trying to write an entire graphics library based on SetPixel and GetPixel: it will work, but yikes, why?

I enjoyed the process as there were some really interesting puzzles, like getting DOES> exactly right (and once I had a solid DOES> implementation, I was creating closures that turned into tiny, tiny amounts of code).

OTHER TIPS

A long time ago, I had a book called "Threaded Interpretive Languages", published I think by Byte, that discussed how to implement a Forth-like language (I don't think they ever called it Forth) in Z80 assembly.

You may not have a Z80 handy, or want one, but the book might be instructive.

This post at comp.lang.forth lists a few "minimal Forths".

http://groups.google.com/group/comp.lang.forth/msg/10872cb68edcb526

Why do I know this? My brother, Mikael, wrote #3 and he also wrote a paper about making a "minimal Forth" (in Swedish, though). If I remember correctly he wanted to get a minimal set of operators that could be built in silicon.

I'm still not convinced the question is well-formed. For example, Plinth's instructions can be reduced; after all, * and / can be implemented in terms of + and -, but then '+' can be implemented in terms of a successor function (see the Peano axioms.) Which puts you into the neighborhood of a Turing machine. How do you know where to stop?

You might also want to take a look at Hans Bezemer's 4tH compiler.

Which Forth implementation are you using that doesn't provide this information in the documentation? Given the nature of Forth, it might be implementation-dependent. There's a standard set of words in the dictionary, but whether they got there by assembly/C/whatever or by Forth shouldn't matter, since Forth is by definition a self-extensible language.

Contrary to what you say, generally DROP SWAP etc are considered basic Forth operations. The reason is that if you implement them using memory operations like you suggest, the overall system becomes more, not less complicated. Also there is no clear distinction in Forth between what is basic and what not. In the 80's a dictionary search would be basic, and coded in assembler for speed, while a modern linux hosted can afford to code that in so called high level. Also Forthers tend to routinely recode assembler words in high level, and high level words in assembler. I'm the author of ciforth and yourforth. It is possible to define <= as "> not" as I did in ciforth . But in yourforth I decided that having all of < <= > >= as similar, uniformly looking, small assembler routines was effectively simpler. That is a judgement call, and a matter of taste, certainly not a matter of principle.

In the context I interpret the question as: "What is a reasonable size for the number of primitive operations to arrive at a reasonable powerful Forth with a reasonable speed?" Clearly you are not interested in clever tricks to get rid of one assembler word at the expense of tremendous overhead as found in some of the threads discussing this subject.

Now you can look at a number of small Forth's like jonesforth yourforth eforth and conclude that mostly one arrives at around 50 to 100 primitives. Those Forth's are defined in assembler. If you want to define your primitives in c, python or Java , the situation is again different. Now for e.g. the above dictionary search you have a choice between c and Forth. Considerations that have nothing to do with language design come into play. You may be a prolific c-programmer, or you may insist on coding it in Forth because it is a learning project.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top