Question

I have the following class:

class VideoFile(models.Model):
    media_file = models.FileField(upload_to=update_filename, null=True)

And when I try to upload large files to it (from 100mb up to 2Gb) using the following request, it can take quite a long time after the upload process, and during the VideoFile.save() process.

def upload(request):
    video_file = VideoFile.objects.create(uploader=request.user.profile)
    video_file.media_file = uploaded_file
    video_file.save()

On my Macbook Pro Core i7, 8Gb RAM, a 300mb uploaded file can take around 20 seconds or so to run video_file.save()

I suspect this delay is relating to a disk copy operation from /tmp to the files permanent location? I've proven this by running watch ls -l on the target directory, and as soon as video_file.save() runs, I can see the file appear and grow throughout the delay.

Is there a way to eliminate this file transfer delay? Either by uploading the file directly to the target filename or just by moving the original file instead of copying? This is not the only upload operation across the site however so any solution needs to be localized to this model.

Thanks for any advice!

UPDATE:

Just further evidence to support a copy instead of a move, i can watch lsof during the upload and see a file within /private/var/folders/... written from python which maps exactly to upload progress. After upload is complete, another lsof entry appears for the ultimate file location which grows over time. After that's complete, both entries disappear.

Was it helpful?

Solution 2

Ok after a bit of digging I've come up with a solution. It turns out Django's default storage already attempts to move the file instead of copy, which it first tests:

hasattr(content, 'temporary_file_path')

This attribute exists for the class TemporaryUploadedFile which is the object returned to the Upload View, however the field itself is created as the class specified by FileField.attr_class

So instead I decided to subclass FieldFile and FileField and slot in the temporary_file_path attribute:

class VideoFieldFile(FieldFile):
    _temporary_file_path = None
    def temporary_file_path(self):
        return self._temporary_file_path


class VideoFileField(FileField):
    attr_class = VideoFieldFile

Finally in the view, before saving the model I manually assigned the temp path:

video_file.media_file._temporary_file_path = uploaded_file.temporary_file_path()

This now means my 1.1Gb test file becomes available in about 2-3 seconds rather than the 50 seconds or so I was seeing before. It also comes with the added benefit that if the files exist on different file systems, it appears to fall back to the copy operation.

As a side note however, my site is not utilizing MemoryFileUploadHandler which some sites may use to handle smaller file uploads, so I'm not sure how nice my solution might work with that, but I'm sure it would be simple enough to detect the uploaded file's class and act accordingly.

OTHER TIPS

I would caution that there are quite a few reasons why uploading to /tmp and then cping over is best practice, and that uploading large files directly to their target is a potentially dangerous operation.

But, what you're asking is absolutely possible. Django defines upload handlers:

You can write custom handlers that customize how Django handles files. You could, for example, use custom handlers to enforce user-level quotas, compress data on the fly, render progress bars, and even send data to another storage location directly without storing it locally.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top