Question

I'm trying to understand the RTF 1.9.1 specification document but #PCDATA (text without control words) is confusing me. Below is some sample code to show what I don't understand. Note that the text below is formatted incorrectly. I formatted it to make it look nicer.

{
    \fonttbl
    {
        \f0
        \fbidi 
        \froman
        \fcharset0
        \fprq2
        {
            \*
            \panose 
            02020603050405020304
        }
        Times New Roman;
    }
}

The specification says:

If the character is anything other than an opening brace ({), closing brace (}), backslash (\), or a CRLF (carriage return/line feed), the reader assumes that the character is plain text and writes the character to the current destination using the current formatting properties.

If I were to follow the specification above, I would end up writing Times New Roman to the document. How is a parser supposed to know whether it has encountered #PCDATA or document text?

Was it helpful?

Solution

The answer is on page 9 of the RTF 1.9.1 specification.

Certain control words, referred to as destinations, mark the beginning of a collection of related text that could appear at another position, or destination, within the document. Destinations may also include text that is used but does not appear within the document at all.

In the example I gave in the question, fonttbl is a destination control word meaning the text doesn't appear in the document. On page 11 of the specification a list of example control words that change the destination is given:

Examples of control words that change destination are \footnote, \header, \footer, \pict, \info, \fonttbl, \stylesheet, and \colortbl.

There are many more but those are the main ones.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top