Getting the contents of django-storages file for processing

https://stackoverflow.com/questions/12044698

27-06-2021
|

Question

When I was locally serving media and I needed to process a task getting the file contents was very straight-forward. However I just shifted over to django-storages and it's not a drop in replacement. Can someone provide me a method which will pull the document off of S3 so I can process it.

Old way:

filename = settings.MEDIA_ROOT + "/" + document.name
xlsx = XLSXParser(filename = filename, uniq_header_column='XYX')

However I shifted over to django-storages and this (obviously) will not work. How do you pull a local copy of the file from s3 to process it. I thought I could simply do this:

New (failing) way:

filename = settings.MEDIA_ROOT + "/" + document.name

if not os.path.isfile(filename):
    new_filename = tempfile.NamedTemporaryFile(delete=False)
    new_filename.write(document.read())
    filename = new_filename

xlsx = XLSXParser(filename = filename, uniq_header_column='XYX')

But I can't do a read() on this as it bombs.

Traceback (most recent call last):
  File ".../celery/task/trace.py", line 212, in trace_task
    R = retval = fun(*args, **kwargs)
  File ".../tasks.py", line 63, in process_homes
    process_homes_non_task(**kwargs)
  File ".../tasks.py", line 33, in process_homes_non_task
    new_filename.write(document.read())
  File ".../django/core/files/utils.py", line 16, in <lambda>
    read = property(lambda self: self.file.read)
  File ".../django/db/models/fields/files.py", line 46, in _get_file
    self._file = self.storage.open(self.name, 'rb')
AttributeError: 'FieldFile' object has no attribute 'storage'

In the end I need it work with both the old way and the new way. Clearly I am over-thinking this a bit..

Update:

Following the docs didn't help either.

filename = settings.MEDIA_ROOT + "/" + document.name
if not os.path.isfile(filename):
    from django.core.files.storage import default_storage
    s3_file = default_storage.open(document.name, 'rb')
    new_filename = tempfile.NamedTemporaryFile(delete=False)
    new_filename.write(s3_file.read())
    filename = new_filename

xlsx = XLSXParser(filename = filename, uniq_header_column='Lot_Number')
xlsx.load_workbook_and_sheet()

Thanks for the help.

Solution

Use the Django File Object and Storage API , both are exactly made for your use case. For specific extensions check this excellence App and pick the S3 Storage Backend.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow