Question

I am working on software to store legal documents and I was thinking that PDF might be an ideal format to work in. However I am a little confused as to what would best suit my needs in this regard in the actual format of the PDF file.

I have the following requirements for the documents:

  • will be stored for a minimum of 7 years if not longer
  • not editable
  • contain both images and text (images will be in .jpg format ideally)

I was originally looking at using PDF/A-1 however I have discovered that this format does not seem to like using JPEG images, or at least it doesn't when using JODConverter.

Any suggestions/explanations as to which format would best meet these needs would be much appreciated!

Was it helpful?

Solution

For the requirements you described, PDF/A-1b (yes, b at the end!) is the ideal format. The b is for basic -- it has less strict requirements to meet than the PDF/A-1a (a at the end), which is for accessible (or advanced, as I mnemonic it).

If you have no difficulty implementing PDF/A-1a, you may as well go for it. However, depending on your source documents, PDF/A-1a may be extremely difficult and nearly impossible to generate (as it requires the additional tagging of the file's content for the accessibility features).

As for JPEG: of course PDF/A-1b supports JPEGs. It does not allow JPEG2000 compression to be used, because that algorithm was patent encumbered at the time of defining the PDF/A-1b standard. PDF/A-1b generating software therefor must re-compress objects using this type of compression with one of the other methods (which does not pose a big practical problem though.)

You may also want to look at the The PDF/A Competence Center (PDFA) website. (Disclosure: I'm a member of the PDFA.)

OTHER TIPS

PDF/A-1 is a good format for long-term storage (as that's it's intention) and so it tries to remove external dependencies. This includes some things like embedding fonts and DISABLING external hyperlinks (which makes sense also, but can be a gotcha). Some useful info is on the Adobe site (look at the key-specifications tab). PDF sounds like the right answer to your requirements.

The images being embedded should not be a problem. JODReports perhaps is doing something wrong (or the version of OpenOffice/LibreOffice you are using underneath). You could try switching parts of that underlying infrastructure (OO/LO), try experimenting directly from OpenOffice/LibreOffice GUI - export PDF/A-1 and see what the results are or try some other tools in the chain (eg Docmosis though that is based on similar technology).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top