Should I dynamically recreate a PDF, rather than store it in either the database or the filesystem?

StackOverflow https://stackoverflow.com/questions/227727

Question

I need customers to be able to download PDFs of letters that have been sent to them.

I have read the threads about database versus filesystem storage of documents or images, and it does sound like the consensus is that, for anything more than just a few images, filesystem is the way to go.

What I want to know:

  • would a reasonable alternative be to just store the letter details in the database, and recreate the PDF 'on the fly' when it is requested?
  • Is that approach superior or inferior to fetching the PDF from the filesystem?
Was it helpful?

Solution

I'd store it off for two reasons

1) If you ever change how you generate the PDF, you probably don't want historical items to change. If you generate them every time, either they will change or you need to keep compatibility code to generate "old-style" records

2) Disk space is cheap. User's patience isn't. Unless you're really pressed for storage or pulling out of storage is harder than generating the PDF, be kind to your users and store it off.

Obviously if you create thousands of these an hour from a sparse dataset, you may not have the storage. But if you have the space, I'd vote for "use it"

OTHER TIPS

If it is for archival purposes, I would definitely store the PDF because in future, your PDF generation script may change and then the letter will not be exactly the same as what was originally sent. The customer will be expecting it to be exactly the same.

It doesn't matter what approach is superior, sometimes it is better to go for what approach is safer.

Is there a forensics reason why you have to maintain records of letters sent to customers? If you are going to regenerate on the fly, how do you know that future code changes won't rewrite the letter (or, at least, the customer can make that argument in court if the information is used in a lawsuit)...

I'm inclined to say "it depends".

When one document is requested many times, it may be a saving if you compose it on the first request, and retrieve it subsequentially.

OTOH if most requests for a document are of the just-once type, and the creation process doesn't eat up most of your server capacity, on-the-fly will have clear advantage.

If you're using ASP.NET why not cache the PDF. Your cache can be stored in the database if you like or left in memory for as long as you may need it first. The enterprise library implements this for you in the caching application block and it's remarkably simple to use. If you cache the object, create a storage in the database using the block and then load it when you need it you won't have to worry about re-creating it.

Few things to consider, is the PDF generate based on data as it existed at some point in time. E.G. a Bill based on data from the prior month?

If so, Would you use the same template each month to generate this letter? What happens if/when the letter format changes, if you regenerate on the fly it is no longer the same that was sent to them. Is storing the PDF stream into the database a possibility?

I guess what I am getting at, do you need an exact representation of what was sent to the user, or is that flexible?

The question of whether to generate the pdfs dynamically or store them statically sounds more like a question of law than a question of programming.

If you don't have access to legal counsel that can provide guidance on this then it is going to be far safer to err on the side of caution and store them statically.

As long as the PDF document is of permanent nature (not just a work doc, but something official signed and sent somewhere else in the company or outside the company), you should have a copy of this PDF file on your network, and a link to this file in your database.

You cannot rely on the available data to reproduce the very same document at a different time mainly because:

  1. Data can be changed (yes! suppose that the letter is settled to be signed by Head Of Department, and staff has changed?)
  2. Your report format will change (header, footer, logo, etc)
  3. The document you produced is kept by somebody else who will make use of the data available in the document.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top