Question

The java web app I'm developing allows users to upload files (pictures and documents) to their profiles and define access rules for those files (define which of the other users are able to view / download the file). The access control / permission system is custom made and rules are stored in mongoDB alongside the user's profile and the actual file entry.

Knowing that I need the application and storage to be distributed and fault-tolerant I need to figure out which is the best strategy for file storage.

Should I store the files inside mongoDB in the files collection where the file document containing description and access rules are located ?

Or should I store the files inside the server's file system and keep the path in the mongoDB document? With the filesystem approach will I still be able to enforce the user defined access permissions and how? Finally in the filesystem approach how do I distribute files accross servers? Should I use dedicated servers for this or can I store the files on the webapp servers or mongodb servers ?

Thanks a lot for all your insights! Any help or feedback appreciated.

Alex

Was it helpful?

Solution

There are several alternatives:

  • put files in a storage service (e.g. S3): easy and much space but bad perf
  • put files in a local filesystem: fast but doesnt scale
  • put files in mongodb docs: easy, powerful and scalable but limited to 16MB
  • use GridFS layer of mongodb. Functionalities are limited but it is made for scalability (thanks to sharding) and is fairly fast too. Note you can put info about file (permission etc) right into the file's metadata object.

In your case it sounds like last option may be best, there are quite a few users who switched from FS to gridFS and it worked very well for them. Things to keep in mind:

  • gridfs sharding works but is not perfect: usually only data is sharded, not the metadata. Not a big deal but the shard with metadata must be very safe.
  • it can be beneficial to use gridfs in a separate mongodb cluster from your core data, since requirements (storage, backup, etc) are usually different.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top