Tail call optimization in Racket

https://stackoverflow.com/questions/11508987

21-06-2021
|

Frage

I was doing SICP exercise 2.28 and stumbled upon a strange behaviour of the following code:

(define (fringe tree)
  (cond
    ((null? tree) '())
    ((not (pair? tree)) (list tree))
    (else (append (fringe (car tree)) (fringe (cdr tree))))))

(define (fringe-tail tree)
  (define (fringe-iter tree result)
    (cond
      ((null? tree) result)
      ((not (pair? tree)) (list tree))
      (else (fringe-iter (cdr tree) (append result (fringe-tail (car tree)))))))
  (fringe-iter tree '()))

(define x (make-list (expt 10 4) 4))
(time (fringe x))
(time (fringe-tail x))

Ordinary fringe runs much faster than its iterative version fringe-tail:

cpu time: 4 real time: 2 gc time: 0

vs.

cpu time: 1063 real time: 1071 gc time: 191

It looks like fringe was optimized into loop and avoids any allocations, while fringe-tail runs much slower and spends time creating and destroying objects.

Can anyone explain this to me? (Just in case I'm using racket 5.2.1)

Lösung

If you replace the last clause with:

(else (fringe-iter (cdr tree) (append (fringe-tail (car tree)) result)))

then they run at the same speed for that input, and the tail-recursive version is faster for larger input.

The problem is that you're appending the much longer list for the cdr on to the front, which traverses and allocates much more than the naive version, which appends the fringe of the car on to the front.

Andere Tipps

The given code has applications in non-tail position, so the function is not iterative, despite its name. :)

Try this:

(define (fringe-tail tree)
  (define (iter tree k)
    (cond
      [(null? tree)
       (k '())]
      [(not (pair? tree)) 
       (k (list tree))]
      [else
       (iter (car tree)
             (lambda (v1)
               (iter (cdr tree)
                     (lambda (v2)
                       (k (append v1 v2))))))]))
  (iter tree (lambda (a-fringe) a-fringe)))

However, it still uses append which is as expensive as the length of its first argument. Certain degenerate inputs into fringe and fringe-tail will cause a lot of computational suffering.

Let's give an example of such degenerate inputs:

(define (build-evil-struct n)
  (if (= n 0)
      (list 0)
      (list (list (build-evil-struct (sub1 n)))
            (build-evil-struct (sub1 n))
            (list n))))

(define evil-struct (build-evil-struct 20))

When applied to both fringe and fringe-iter, you'll see very bad performance: I observe seconds of compute time on my own system for fringe and fringe-tail. These tests were run under DrRacket with debugging disabled. If you enable debugging, your numbers will be significantly different.

> (time (void (fringe evil-struct)))
cpu time: 2600 real time: 2602 gc time: 1212

> (time (void (fringe-tail evil-struct)))
cpu time: 4156 real time: 4155 gc time: 2740

With both of these, the use of append is what makes these susceptible to certain degenerate inputs. If we write an accumulating version of fringe, we can eliminate that cost, since we then get to use the constant-time cons operation:

(define (fringe/acc tree)
  (define (iter tree acc)
    (cond [(null? tree)
           acc]
          [(not (pair? tree))
           (cons tree acc)]
          [else
           (iter (car tree) (iter (cdr tree) acc))]))
  (iter tree '()))

Let's look at the performance of fringe/acc on this structure:

> (time (void (fringe/acc evil-struct)))
cpu time: 272 real time: 274 gc time: 92

Much better! And it's a straightforward matter to turn all the calls here to tail calls.

(define (fringe/acc/tail tree)
  (define (iter tree acc k)
    (cond [(null? tree)
           (k acc)]
          [(not (pair? tree))
           (k (cons tree acc))]
          [else
           (iter (cdr tree) acc
                 (lambda (v1)
                   (iter (car tree) v1 k)))]))
  (iter tree '() (lambda (v) v)))

> (time (void (fringe/acc/tail evil-struct)))
cpu time: 488 real time: 488 gc time: 280

Racket's implementation of the stack is, in this particular case, a bit faster than our reified stack we're representing in the continuations, so fringe/acc is faster than fringe/acc/tail. Still, both of these are significantly better than fringe because they avoid append.

All this being said: this function is already built-into Racket as the flatten function! So you might as well just use that if you don't want to reinvent the wheel. :)

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow