Although well-behaved web servers won't give you gzipped responses unless you gave them an Accept-Encoding: gzip
request header, not every web server is well-behaved.
So, you need to look for the Content-Encoding: gzip
response header and use gunzip-through-ports
. (You can do the same for Content-Encoding: deflate
and inflate
.)
Of course, to "look for the response header" you can't use get-pure-port
anymore, you have to use get-impure-port
and purify-port
. Pseudo-code:
#lang racket
(require net/url
net/head
file/gunzip)
(define u (string->url "http://www.wikipedia.org"))
(define in (get-impure-port u '("Accept-Encoding: gzip")))
(define h (purify-port in))
(define out (open-output-bytes))
(match (extract-field "Content-Encoding" h)
["gzip" (gunzip-through-ports in out)]
[_ (copy-port in out)])
(define bstr (get-output-bytes out))
(close-input-port in)
p.s. I think the above is easier to explore when trying it out for the first time. But for production code I'd probably use call/input-url
to help handle closing the port:
#lang racket
(require net/url
net/head
file/gunzip)
(define u (string->url "http://www.wikipedia.org"))
(define bstr
(call/input-url u
(curryr get-impure-port '("Accept-Encoding: gzip"))
(lambda (in)
(define h (purify-port in))
(define out (open-output-bytes))
(match (extract-field "Content-Encoding" h)
["gzip" (gunzip-through-ports in out)]
[_ (copy-port in out)])
(get-output-bytes out))))
p.p.s.
That version might be even clearer if it didn't use curryr
and an anonymous function. For example:
#lang racket
(require net/url
net/head
file/gunzip)
;; Like get-impure-port, but supplied Accept-Encoding gzip request
;; header.
(define (get-impure-port/gzip u)
(get-impure-port u '("Accept-Encoding: gzip")))
;; Read response headers using purify-port, and read the response
;; entity handling gzip encoding.
(define (read-response in)
(define h (purify-port in))
(define out (open-output-bytes))
(match (extract-field "Content-Encoding" h)
["gzip" (gunzip-through-ports in out)]
[_ (copy-port in out)])
(get-output-bytes out))
(define bstr
(call/input-url (string->url "http://www.wikipedia.org")
get-impure-port/gzip
read-response))