Question

I'm playing around with Racket and missed a byte-string comprehension. When I found for/fold/derived with examples in the documentation, I decided to roll my own byte-string comprehension macro, as any beginner would:

(define-syntax (for/bytes stx)
    (syntax-case stx ()
      ((_ clauses . defs+exprs)
       (with-syntax ((original stx))
         #'(let-values
             (((bstr i max-length)
               (for/fold/derived original ((bstr (make-bytes 16)) (c 0) (ln-incr 32)) clauses
                 (define el (let () . defs+exprs))
                 (let-values (((new-bstr new-ln-incr)
                           (if (eq? c (bytes-length bstr))
                       (values (bytes-append bstr (make-bytes ln-incr)) (* ln-incr 2))
                       (values bstr ln-incr))))
                   (bytes-set! new-bstr c el)
               (values new-bstr (+ c 1) new-ln-incr)))))
     (subbytes bstr 0 i))))))    

I've got a few related questions:

  1. Is this the Racket way anyhow?
  2. Is the macro ok? Basically I combined the examples from the for/fold/derived documentation with a macro-expaned for/vector
  3. Are there any obvious performance optimizations?

Sadly, it's not really faster than (list->bytes (for/list ... This micro-benchmark:

(define size 50000)
(define (custom-byte-test) (for/bytes ((i (in-range size))) (modulo i 256)))
(define (standard-list-test) (list->bytes (for/list ((i (in-range size))) (modulo i 256))))
(profile-thunk custom-byte-test #:repeat 1000)
(profile-thunk standard-list-test #:repeat 1000)

gives 3212ms vs 3690ms. For sizes much smaller than 50000 my for/bytes loses, for sizes bigger than that it wins.

Was it helpful?

Solution

My answers:

Is this the Racket way anyhow?

Yes.

Is the macro ok? Basically I combined the examples from the for/fold/derived documentation with a macro-expand for/vector

Yes, I think it looks good.

Are there any obvious performance optimizations? Sadly, it's not really faster than (list->bytes (for/list ...

I'm not aware of how to do it faster. The "win" here is that the complexity of buffer resizing is hidden from users of for/bytes.

OTHER TIPS

I sped your code up a little.

1) The bytes-length calculation in the inner loop is unneeded since you already know the current length. I replaced your ln-incr with a bstr-len that represents both the current length and the amount to increase the length.

This gave ~15% improvement.

2) Since you already do length checking, you can safely use unsafe-bytes-set!, which speeds things up another ~10%.

On my machine, custom-byte-test is now ~1200ms vs ~1750ms for standard-list-test.

#lang racket
(require racket/unsafe/ops profile)

(define-syntax (for/bytes stx)
  (syntax-case stx ()
    [(_ clauses . defs+exprs)
     (with-syntax ([original stx])
       #'(let ([init-bstr-len 32])
           (let-values
             ([(bstr i max-length)
               (for/fold/derived 
                original 
                ([bstr (make-bytes init-bstr-len)]
                 [c 0]
                 [bstr-len init-bstr-len]) ; <-- use as curr len + extend len
                clauses
                (define el (let () . defs+exprs))
                (let-values 
                  ([(new-bstr new-bstr-len)
                    (if (= c bstr-len) ; <-- remove len calculation
                        (values 
                          (bytes-append bstr (make-bytes bstr-len))
                          (* bstr-len 2))
                        (values bstr bstr-len))])
                  (unsafe-bytes-set! new-bstr c el) ; <-- unsafe op
                  (values new-bstr (add1 c) new-bstr-len)))])
             (subbytes bstr 0 i))))]))

(define size 50000)
(define (custom-byte-test) 
  (for/bytes ([i (in-range size)]) (modulo i 256)))
(define (standard-list-test) 
  (list->bytes (for/list ([i (in-range size)]) (modulo i 256))))
(profile-thunk custom-byte-test #:repeat 1000)
(profile-thunk standard-list-test #:repeat 1000)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top