Algorithmic Complexity Analysis: practically using Knuth's Ordinary Operations (oops) and Memory Operations (mems) method

StackOverflow https://stackoverflow.com/questions/15464410

Question

In implementing most algorithms (sort, search, graph traversal, etc.), there is frequently a trade-off that can be made in reducing memory accesses at the cost of additional ordinary operations.

Knuth has a useful method for comparing the complexity of various algorithm implementations by abstracting it from particular processors and only distinguishing between ordinary operations (oops) and memory operations (mems).

In compiled programs, one typically lets the compiler organise the low level operations, and hopes that the operating system will handle the question of whether data is held in cache memory (faster) or in virtual memory (slower). Furthermore, the exact number / cost of instructions is encapsulated by the compiler.

With Forth, there is no longer such encapsulation, and one is much closer to the machine, albeit perhaps to a stack machine running on top of a register processor.

Ignoring the effect of an operating system (so no memory stalls, etc.), and assuming for the moment a simple processor,

(1) Can anyone advise on how the ordinary stack operations in Forth (e.g. dup, rot, over, swap, etc.) compare with the cost of Forth's memory access fetch (@) or store (!) ?

(2) Is there a rule of thumb I can use to decide how many ordinary operations to trade-off against saving a memory access?

What I'm looking for is something like 'memory access costs as much as 50 ordinary ops, or 500 ordinary ops, or 5 ordinary ops' Ballpark is absolutely fine.

I'm trying to get a sense of the relative expense of fetch and store vs. rot, swap, dup, drop, over, correct to an order of magnitude.

Was it helpful?

Solution 2

A comparison between memory fetches and register operations is okay for assembler programs, as it is for the output of c-compilers, which is in fact an assembler program. In Forth this question hardly makes sense. In the first place Forth is an interpreter and in using Forth one foregoes the ultimate in speed. Of course one could add an optimiser on top of Forth but then the question makes even less sense, because the output of a c-optimiser and a Forth optimiser converge to -- you guessed it -- an optimal solution.

Let's look at an elementary operation in Forth like AND. This is implemented as

> CODE AND
>     POP AX
>     POP BX
>     AND AX, BX
>     PUSH AX
>     NEXT

So we see already three memory operations for something that looks like an elementary calculation operation. It appears the Knuth metric is not applicable. Also Forth seems to be loosing big time.That is however not true. Those memory operations are all onto the L1 cache of a typical processor. That is about as efficient as local variables in small c functions, We can compare stack operations with memory operations using VARIABLE's and the stack. The answer is simple. A VARIABLE risks a memory stall. A stack operation will almost certainly be a L1 cache hit. This is the single most important point of consideration. However the question explicitly asks not to consider it! So there.

OTHER TIPS

This article How much time does it take to fetch one word from memory? talks about main memory stall times, with some rule of thumb type numbers, but basically you can do lots of instructions while stalling for main memory. As others have said, the numbers vary a lot between systems.

Main memory stalls is a big area of interest, especially as CPUs have more cores, but typically not much faster memory bandwidth. There is some research going on around compressing data in main memory too, so that the CPU can take advantage of 'spare' cycles and tightly packed cache lines http://oai.cwi.nl/oai/asset/15564/15564B.pdf

For those who are really interested in the details, most CPU manufacturers publish in depth guides on memory optimisations etc. mostly aimed at high end and compiler writers, but readable by all 2gl and 3gl programmers.

Ps. Go Forth.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top