Question

I am hosting a small fileserver, where users can upload documents from all around the world.

Due to problems in encoding (see otherquestion), I am asking myself if I should disallow users to upload (and on the other hand download) files not supported by CP1252 charset?

or otherwise; is it senseful to allow users upload documents with arabian or chinese letters in their filenames?

PS: they download the same file some time later (and it should have the same filename as uploaded)

Was it helpful?

Solution

You should be storing the files on disk using a randomly generated name, or let the file name be based on a hash of the file contents (good for deduplicating storage as well). You can save the original file name as meta data in a database together with all other meta data about the file (who uploaded it and things like that). Then you serve the file again using a PHP script which sets the original file name from the database in an HTTP header. This way you:

  • don't need to worry about file name sanitisation or duplication
  • file system encoding issues
  • storage duplication (if using a hash)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top