Question

I'm writing a custom ftp manager in Python for some organizational code in my company. I'm just extending the basic retrbinary to give the user some feedback, percent downloaded, time elapsed, that sort of thing. It occurred to me that I could save downloaded blocks into memory instead of writing them as soon as they are received, and then write the whole binary when it completes. Alternatively, I could save small block sizes to memory, and set larger a cutoff size at which to do a disk write, writing to disk in chunks, but chunks that are larger than the chunks that come in over the wire.

I can do these things. I don't yet know if there is any compelling reason to do these things. Would it perhaps increase reliability or something?

Thanks!

Was it helpful?

Solution

  • If you keep the whole file in memory then you're occupying some resources. If the file is really big, or if you download several files simultaneously that add up to "really big", you might even fail because you run out of memory (more likely on a 32 bit system than 64 bit). Then you'll wish you had saved as you went along, especially if the download took a long time and has to start from scratch.

  • If you save part of the file and the operation fails then you may be able to resume later from where you failed (https://stackoverflow.com/questions/5678119/how-does-a-ftp-server-resume-a-download)

  • If you save part of the file and you want to fail completely with no expectation of resuming, you probably have to write (a small amount of) extra code to delete the broken part-file.

  • If you're going to save as you go, then there's not much benefit in waiting to write multiple chunks to disk at once. Usually your network I/O will be much slower than your disk I/O, but if this is not the case then it may be more efficient to do fewer, larger disk writes. Even if you care about that, you can do it just by setting the buffer size of the file you're writing to (for example using the buffering argument to open()). There's no need to write your own code to keep a certain amount of data hanging around.

I think the balance will normally be to write data to disk more or less as soon as you have it. There may be special cases that are different.

If it weren't for the fact that you're showing progress as you go, then the code might be simpler if you download a whole file into memory with a single read() call (not sure whether there's an easy way to do that with ftplib in particular, but other download mechanisms are available). Since you download the file in small blocks anyway I doubt that it significantly complicates the code to write as you go, but if it somehow does then that may also be worth thinking about.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top