Question

I'm looking for a grammar of PDF 1.7 (BNF or variant)

absolutely not googleable

Was it helpful?

Solution

PDF is a binary format that is not context-free. In PDF for example you need to read and interpret the size of a binary stream before parsing the stream.

Example:

10 0 obj
<</Type /XObject
/Subtype /Image
/Width 260
/Height 52
/ColorSpace /DeviceRGB
/SMask 10 0 R
/BitsPerComponent 8
/Filter /FlateDecode
/Length 4570>> stream
--- insert binary data here ---
endstream
endobj

There is no way to tell if your binary data will contain the tokens endstream or endobj inside, so you have no other choice than reading the length of the stream before parsing it.

BNF can only be used for context-free grammars, so it is not possible to construct a BNF grammar for PDF.

Take a look at the specification here: PDF Reference Document

OTHER TIPS

I am not aware of any formal specification of the PDF file format in the form of a grammar, BNF or not.

But I happen to know for sure that the ISO technical committee 171/SC2 which currently works on the specification of PDF-2.0 has an agenda topic of "Updates from ad hoc committees: [...] iv. File format syntax for validating PDF files (L. Rosenthol)" for its next face to face meeting taking place in Berlin, Sept 11-12 2012. -- Which agenda item I take as "some more people seem to be interested in a more formal description of the PDF syntax"... :-)

Leonard Rosenthol is an Adobe PDF higher-up, and he frequently answers questions in the Adobe user forums. Maybe it is a good idea to ask a question there? Chances are, there you'll get a better answer than here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top