Question

I'm using go 1.1 devel on Ubuntu 13.04

go version devel +ebe8bca920ad Wed May 15 15:34:47 2013 +1000 linux/386

According to http://golang.org/doc/faq#goroutines

When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked.

I'm trying to write a downloader which can download a large file in chunks using goroutines and this is the best goroutine I've come up with:

func download(uri string, chunks chan int, offset int, file *os.file) {
    for current := range chunks {

        fmt.println("downloading range: ", current, "-", current+offset)

        client := &http.client{}
        req, _ := http.newrequest("get", uri, nil)
        req.header.set("range: ", fmt.sprintf("bytes=%d-%d", current, current+offset))
        resp, err := client.do(req)
        if err != nil {
            panic(err)
        }
        defer resp.body.close()
        body, err := ioutil.readall(resp.body)
        if err != nil {
            panic(err)
        }
        file.write(body)
    }
}

The full script is available at https://github.com/tuxcanfly/godown/blob/master/godown.go

Even though, the files are being dowloaded and saved correctly, I can see that the second chunk starts only when the first finishes.

Shouldn't the chunked downloads run in parallel, or is there something I'm doing wrong?

Was it helpful?

Solution

You only have one goroutine downloading chunks.

Line 64:

go download(*download_url, chunks, offset, file)

What you probably want is:

for i := 0; i < *threads; i++ {
    go download(*download_url, chunks, offset, file)
}

This will download *threads chunks at once.


After you have concurrency working, you will probably notice that line 29 doesn't work how you intend. If chunk 1 finishes before chunk 2, the parts will be written out of order. You may want to instead use http://golang.org/pkg/os/#File.WriteAt.


You also have two problems with your Range header.

  1. You don't download the remainder. If the file size is 3002 and you have 3 threads, it will request 0-1000, 1000-2000, 2000-3000 and the last 2 bytes will never be downloaded.
  2. Byte ranges are inclusive. that means you are (as you can see in the previous example) downloading some bytes twice. Byte 1000 and 2000 are requested twice. Of course, as long as you write to the correct locations, you shouldn't have too much of a problem.

Number two is easy enough to fix by changing line 19 from

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset))

to this

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset-1))

For more information on the Range header, I suggest reading Section 14.35 in RFC2616

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top