Question

I have a django project which use django-storage over s3-boto.

Problem is that every file that is located on S3 unable to cache because the url is changed from each call.

here are two calls generated by django-storage :

https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg?Signature=HlVSayUIJj6dMyk%2F4KBtFlz0uJs%3D&Expires=1364418058&AWSAccessKeyId=[awsaccesskey]     
https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg?Signature=xh2VxKys0pkq7yHpbJmH000wkwg%3D&Expires=1364418110&AWSAccessKeyId=[awsaccesskey]

As you can see the signature is different. What can I do so it wont break my browser cache ?

Was it helpful?

Solution

In your settings, just add the following:

AWS_QUERYSTRING_AUTH = False

This will make sure that the URLs to the files are generated WITHOUT the extra parameters. Your URLs would look like:

https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg

OTHER TIPS

When AWS_QUERYSTRING_AUTH = True (which is the default), django will generate a temporary url each time we fetch the url.

If you don't want to generate a temporary url:

Add AWS_QUERYSTRING_AUTH = False to your settings.py

If you still want a temporary url:

Temporary urls will be valid for AWS_QUERYSTRING_EXPIRE seconds (3600 by default). So we can cache this temporary url (as long as we don't cache it longer than it's valid). This way - we can return the same url for subsequent page requests, allowing the client browser to fetch from their cache.

settings.py

# We subclass the default storage engine to add some caching
DEFAULT_FILE_STORAGE = 'project.storage.CachedS3Boto3Storage'

project/storage.py

import hashlib

from django.conf import settings
from django.core.cache import cache
from storages.backends.s3boto3 import S3Boto3Storage

class CachedS3Boto3Storage(S3Boto3Storage):
    """ adds caching for temporary urls """

    def url(self, name):
        # Add a prefix to avoid conflicts with any other apps
        key = hashlib.md5(f"CachedS3Boto3Storage_{name}".encode()).hexdigest()
        result = cache.get(key)
        if result:
            return result

        # No cached value exists, follow the usual logic
        result = super(CachedS3Boto3Storage, self).url(name)

        # Cache the result for 3/4 of the temp_url's lifetime.
        try:
            timeout = settings.AWS_QUERYSTRING_EXPIRE
        except:
            timeout = 3600
        timeout = int(timeout*.75)
        cache.set(key, result, timeout)

        return result

Protect some file storage

Most of your media uploads – user avatars, for instance – you want to be public. But if you have some media that requires authentication before you can access it – say PDF resumes which are only accessible to members – then you don’t want S3BotoStorage’s default S3 ACL of public-read. Here we don’t have to subclass, because we can pass in an instance rather than refer to a class.

so first remove protection for all file fields in your settings and add cache control

AWS_HEADERS = {
    'Cache-Control': 'max-age=86400',
}

# By default don't protect s3 urls and handle that in the model
AWS_QUERYSTRING_AUTH = False

Then make the file field you need protected to use your custom protected path

from django.db import models
import storages.backends.s3boto

protected_storage = storages.backends.s3boto.S3BotoStorage(
  acl='private',
  querystring_auth=True,
  querystring_expire=600, # 10 minutes, try to ensure people won't/can't share
)

class Profile(models.Model):
  resume = models.FileField(
    null=True,
    blank=True,
    help_text='PDF resume accessible only to members',
    storage=protected_storage,
  )

But you also need to use your normal storage when you are in dev, and you are usually using local storage so this is how i personally went about it

if settings.DEFAULT_FILE_STORAGE == 'django.core.files.storage.FileSystemStorage':
    protected_storage = FileSystemStorage()
    logger.debug('Using FileSystemStorage for resumes files')
else:
    protected_storage = S3BotoStorage(
        acl='private',
        querystring_auth=True,
        querystring_expire=86400,  # 24Hrs, expiration try to ensure people won't/can't share after 24Hrs
    )
    logger.debug('Using protected S3BotoStorage for resumes files')

REF: https://tartarus.org/james/diary/2013/07/18/fun-with-django-storage-backends

Your best bet is to subclass the Boto S3 Storage Backend and override the url method.

/project/root/storage.py

from django.conf import settings
from storages.backends.s3boto import S3BotoStorage

class S3Storage(S3BotoStorage):

    def url(self, name):
        name = self._clean_name(name)
        return '{0}{1}'.format(settings.MEDIA_URL, name)

/project/root/settings.py

MEDIA_URL = 'https://my.s3.amazonaws.com/'
DEFAULT_FILE_STORAGE = 'project.storage.S3Storage'
AWS_ACCESS_KEY_ID = '*******'
AWS_SECRET_ACCESS_KEY = '********'
AWS_STORAGE_BUCKET_NAME = 'your-bucket'

Just make sure your images are publicly readable.

If you care about using signed urls but still want to cache them until they expire, just use django's built-in caching:

from django.core.cache import cache

class MyModel(models.Model):
    #using django-storages with AWS_QUERYSTRING_AUTH=True
    media = models.FileField()

    @property
    def url(self):
        """
        Return signed url, re-using cached value if not expired.
        """
        #Refresh url if within n seconds of expiry to give clients
        #time to retrieve content from bucket.
        #Make sure AWS_QUERYSTRING_EXPIRE is sufficiently larger than n
        n = 30
        time_to_expiry = settings.AWS_QUERYSTRING_EXPIRE - n

        #Create a unique key for this instance's url in the cache
        key = '{0}{1}'.format(self.__class__.__name__, self.pk)

        url = cache.get(key)
        if not url:
            url = self.media.url    #refresh url via django-storages
            cache.set(key, url, time_to_expiry)

        return url

Adding this line in your settings.py file will enable the caching for the image file.

AWS_S3_OBJECT_PARAMETERS = {
        'CacheControl': 'max-age=86400',
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top