Question

I'm in the process of redesigning my web app which is mostly derived of user photo's. Our photo volume is generally in the low millions or upwards to 35 per user.

Anyhow what I'm looking to do is redesign how we store the photo's on the file system and reference them in the database. Our current system works, but not without it's flaws.

Currently I store them as such

user database table

pk 1 
photo_count 12

Which ends up becoming directory

storage/000/000/000/000/001/1_640x480.png

thru

storage/000/000/000/000/001/12_640x480.png

directory is derived from user pk

first number in the file name is the sort order

the file name refers to the photo size.

This is a very efficient way to store the photo's in the database, however it doesn't come without it's flaws. Whenever the sort changes we must first write the changes to a temp directory then overwrite all the photos in the main directory which isn't all that efficient. We also export photo's to other sites and the problem with our current system is if a photo is modified the name never changes thus the third party site never knows to refresh photo's from the feed. Last major issue has to do with the photo count falling out of sync with the directory count. This results in us generating photo URLs from the database photo_count which may or may not exist which causes some third party sites to fail on the photo import job.

my purposed solution was to do the following, but I'd like an expert opinion.

user database table
pk "1"
photos "stored as a comma separated list of photo names generated from SHA-1" example: 

f56c0de1c61fdb926e79e8a0a65bd12930c9.jpg,ec1c55bfb660548a6770238668c4b117d92f.jpg

my thought was to store the photo sort order in the list, so if the order changes all I'd have to do is rearrange the list rather than rename photos.

I figured I could probably continue deriving my directory structure from the user pk although I'd prefer using some sort of hash, I'm just not sure how to reference it in the database if that was the preferred method. Does it just get stored in another column? example

00e4 becomes /00/e4/

The only other issue I seem to run into is the photo size assuming I still need to store thumbnails. Would it be recommended to suffix the file name with _thumb.jpg?

I think this would all address third party feeds since every photo gets a unique name which is always changed upon modification.

Does anybody have an expert opinion on this subject? I'm not sure this is the best solution, so I'd like to hear what others are doing. Thanks so much.

Was it helpful?

Solution

Not sure I understood the question completely, but I'll describe they way we solve the same task (with the only difference that we made a library of arbitrary documents).

Primary DB entry is docs table:

Id              -- file id, pk
UserId          -- owner id
Size            -- file size in bytes
FileName        -- name to display like 'img.jpg'
ContentType     -- image/jpg
FileLocation    -- location of the file on filesystem.
                -- in our case it was something like {storage-root}/{userid}/{guid}
ThumbLocation   -- location of a preview version of the same file.
Created         -- upload time

So upload algorithm:

  • Get a file from a user.
  • Create empty Document object conforming to the structure of the table above.
  • Set doc.FileName to the original filename ("img.jpg").
  • Caclulate destination folder at filesystem per template '/files/{userid}'.
  • Generate unique file name (Guid.New()) and save its value in doc.FileLocation.
  • Write file to doc.FileLocation.
  • Set doc.Size to the original file size.
  • If it's an image, generate preview file, generate it's name: doc.ThumbLocation = doc.FileLocation + "thumb".
  • Write thumb file.
  • save doc to DB.

When user requests file list (by url like /files/{userid}):

  • Read all user's documents from a database
  • (Note that sorting options are not affected by physical files and their locations. All required information is already in the table).
  • Show user a list of documents.
  • Each record shows doc.FileName, doc.Size, and a link pointing to /files/{userid}/{fileid}

When user requests a file by url like /files/{userid}/{fileid}):

  • Read doc from the database
  • Load file from doc.FileLocation
  • Send it to the user.

As a conclusion: we manage physical files and locations only on upload time. The rest of operations are based only on the data we have put to the database. Then getting files content we rely on paths written to the table as it and do not bother their exact meaning.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top