Question

I've been trying to work out the 'best practices' way to manage file uploads with Turbogears 2 and have thus far not really found any examples. I've figured out a way to actually upload the file, but I'm not sure how reliable it us.

Also, what would be a good way to get the uploaded files name?

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        file.filename.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = self.request.params["file"].filename 
    permanent_file.close()

So assuming I'm understanding correctly, would something like this avoid the core 'naming' problem? id = UUID.

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        id.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = file.filename
    permanent_file.close()
Was it helpful?

Solution

@mhawke - you're right you have to handle that - depends on what you are doing with the file, if it doesn't matter if there is a name collision eg you only care for the latest version of some data then theres probably no issue, or if the filename isn't actually important just the file contents, but its still bad practice.

You could use a named tempfile in a tmp dir, then move the file once validated to its final location. Or you could check the filename doesn't already exist like so:

file.name = slugify(myfile.filename)
name, ext = os.path.splitext(file.name)
while os.path.exists(os.path.join(permanent_store, file.name)):
    name += '_'
    file.name = name + ext

raw_file = os.path.join(permanent_store, file.name)

The slugify method would be used to tidy up the filename...

OTHER TIPS

I just want anyone who comes here looking for answers, to know that Allesandro Molina's great library Depot constitutes the best answer to this question.

It solves both the naming and copying issues, and it will incorporate nicely into your TurboGears application. You can use it with MongoDB GridFS, as in this example:

from depot.manager import DepotManager

# Configure a *default* depot to store files on MongoDB GridFS
DepotManager.configure('default', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})

depot = DepotManager.get()

# Save the file and get the fileid
fileid = depot.create(open('/tmp/file.png'))

# Get the file back
stored_file = depot.get(fileid)
print stored_file.filename
print stored_file.content_type

or you can easily create attachment fields in your SQLAlchemy models, like:

from depot.fields.sqlalchemy import UploadedFileField

class Document(Base):
    __tablename__ = 'document'

    uid = Column(Integer, autoincrement=True, primary_key=True)
    name = Column(Unicode(16), unique=True)

    content = Column(UploadedFileField)

… and then, storing documents with attached files (the source can be a file or bytes) becomes as easy as:

doc = Document(name=u'Foo', content=open('/tmp/document.xls'))
DBSession.add(doc)

Depot supports both LocalFileStorage, MongoDB's GridFSStorage, and Amazon's S3Storage. And, at least for files stored locally and in S3, the fileid will be generated by uuid.uuid1().

I don't know much about Turbogears and whether it can provide anything to avoid the following, but it seems to me that this code is fraught with danger. It may be possible for a malicious user to overwrite (or create) any file that the Turbogears python process has write access to.

What if asset_dirname is /tmp, the contents of file.filename is ../../../../../../../etc/passwd and the contents of the file root::0:0:root:/root:/bin/bash? In a UNIX environment this code (permissions pending) would open the file /tmp/../../../../../../../etc/passwd in truncate mode and then copy the contents of the uploaded file to it - effectively overwriting your system's password file and specifying a root user without a password. Presumably there are nasty things that can be done to a Windows machine too.

OK, this is an extreme example that requires that python is running as root (no one does that, do they?). Even if python is running as a low-priveleged user, previously uploaded files could be overwritten at will.

To summarise, don't trust user input, in this case the user supplied filename that is available in file.filename.

Isn't turbogears just pylons with extras? You could check out the help there:

http://wiki.pylonshq.com/display/pylonsdocs/Form+Handling#file-uploads

However, that still contains the potential security flaw that mhawke mentioned:

os.path.join(permanent_store, myfile.filename.lstrip(os.sep))

Same as above really if filename somehow was ../../../../../etc/passwd then you could replace that file...

So you could just get the actual filename like so:

os.path.join(permanent_store, myfile.filename.split(os.sep).pop())

Werkzeug has a very good helper function for securing file names called secure_filename. I think you can adopt and use it.

about how to go, i'd second the good answers already given.

here's my 2 pence about the stored file naming.

indeed saving the file using the original name may cause a vulnerability. the only use i make of the original name, if at all, is to hint the mime type detection.

anyway the files to be saved should be given unique names, by a record identity or something similar, and kept in a place under the control of the app directory owner, who is a regular user, or in some other storage service, as the aforementioned depot etc.

it's a matter of cross language good system design :).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top