Python3 progress bar and download with gzip

Question 1

Perhaps you should try disabling gzip compression or otherwise accounting for it.

The way to turn it off for requests (when using a session as you say you are):

import requests

s = requests.Session()
del s.headers['Accept-Encoding']

The header sent will now be: Accept-Encoding: Identity and the server should not attempt to use gzip compression. If instead you're trying to download a gzip-encoded file, you should not run into this problem. You will receive a Content-Type of application/x-gzip-compressed. If the website is gzip compressed, you'll receive a Content-Type of text/html for example and a Content-Encoding of gzip.

If the server always serves compressed content then you're out of luck, but no server should do that.

If you want to do something with the functional API of requests:

import requests

r = requests.get('url', headers={'Accept-Encoding': None})

Setting the header value to None via the functional API (or even in a call to session.get) removes that header from the requests.

Question 2

You could replace...

dl += len(byte)

...with:

dl = response.raw.tell()

From the documentation:

tell(): Obtain the number of bytes pulled over the wire so far. May differ from the amount of content returned by :meth:HTTPResponse.read if bytes are encoded on the wire (e.g, compressed).

Question 3

Here is a simple process bar implement with tqdm:

def _reader_generator(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024 * 1024)

def raw_newline_count_gzip(fname):
    f = gzip.open(fname, 'rb')
    f_gen = _reader_generator(f.read)
    return sum(buf.count(b'\n') for buf in f_gen)


num = raw_newline_count_gzip(fname)
(loop a gzip file):
    with tqdm(total=num_ids) as pbar:
        # do whatever you want
        pbar.update(1)

The bar looks like: 35%|███▌ | 26288/74418 [00:05<00:09, 5089.45it/s]