Question

I'm working on this weird project where I need to upload huge files (500mb+) and store them in a postgresql table as a blob.

Before you start telling me that storing files in the database is bad and that I should use the filesystem instead, hear (read) my reasons:

  1. The access to the files must be authorized through the app, where individual permissions to them are applied.
  2. I will need to keep an history (versioning) of each file, giving access to previous modifications.
  3. I can't afford the risk of losing consistency between the relations and the files/versions over the course of the years.

Now, on to the questions:

  1. Do I risk getting timeouts during upload/download or the download process counts as a 'keep-alive'?
  2. Are the files completely loaded into memory for each user or are they buffered (streamed) to the user? I need to allow some thousands of users to download big files at the same time.
  3. Are there any packages that can help me with this?
Was it helpful?

Solution

This sounds like a very tough nut to crack, I'm not even sure it can be cracked at all.

The database is going to become enormous and hence slow in a short period of time, caching is not going to happen on any appreciable level.

Keeping multiple copies of such big files for versioning is going to require an inordinate amount of disk space (which is cheap, granted, but you are talking about years worth of big files).

Are you sure you have such requirements? The way it's been envisioned, this project is going to require huge hardware resources (not to mention bandwidth, both incoming and outgoing).

If your customer/boss has a budget to boot then it's cool, but I just wanted to point this out. This is a tough challenge from an engineering standpoint, the kind of challenge that you want to avoid if at all possible.

The only thing I can think of in order to save some space on disk is storing the first version of a file and then calculating and storing binary deltas against it.

Not knowing the nature of the binary files you're dealing with, I can only suggest that you look into bsdiff or even Google's Courgette (which is what they use to push Chrome updates out). They have the added benefit of compressing in addition to generating a binary delta. You're going to have to drop to either C or C++ for that subsystem, though, because I'm not really sure PHP would cut it.

Both bsdiff and Courgette are geared towards executables, which can change wildly even for small source code modification, so they could work even better for files that don't change that much from one version to the next.

And this is about all I can suggest.

Licensed under: CC-BY-SA with attribution
scroll top