Domanda

I have the following segment of Markdown with embedded LaTeX equations:

# Fisher's linear discriminant

\newcommand{\cov}{\mathrm{cov}}
\newcommand{\A}{\mathrm{A}}
\renewcommand{\B}{\mathrm{B}}
\renewcommand{\T}{^\top}

The first method to find an optimal linear discriminant was proposed by Fisher
(1936), using the ratio of the between-class variance to the within-class variance
of the projected data, $d(\vec x)$, as a criterion. Expressed in terms of the
sample properties, the $p$-dimensional centroids $\bar {\vec x}_\A$ and
$\bar {\vec x}_\B$ and the $p \times p$ covariance matrices
$S_A = \cov_i ( \vec x_{\A i} )$ and $S_B = \cov_i ( \vec x_{\B i} )$, the
optimal direction is given by 
$$
\vec w = \left ( \frac{ S_A + S_B }{2} \right ) ^{-1}
~ ( \bar {\vec x}_\B - \bar {\vec x}_\A ).
$$

When I convert it with pandoc to LaTeX and compile it with xelatex, I get the expected text with nicely rendered math. When I convert it with pandoc to MS Word using

pandoc test.text -o test.docx

and open it in MS Office Word 2007, I get the following:

word screenshot

Only those parts of the equations that are symbols or upright text get rendered correctly, while variable names in italics are replaced by a question mark in a box.

How can I make this work?

È stato utile?

Soluzione

In Word 2007, I see a result similar to yours, except that here, I don't see the "question marks in boxes" characters, just space.

If I then take one of the expressions, and use your trick of going to linear display and back, the characters reappear for that expression.

If I save and re-open, the other expressions still do not display correctly, but if I save and look at the XML, I notice that

  1. the Math font has been changed to Cambria Math
  2. additional run parameter (w:rPr) XML specifying the Cambria Math font has been inserted in many of the runs (w:r) inside the oMath elements, even in the oMath expressions that do not display correctly. However, in the oMath expression that now displays correctly, this extra XML has been applied to every run. In the others, it has only been applied to some runs (I think I can see the pattern but I'm running out of time here right now...)
  3. If I manually add the XML to the other runs and re-open the document, the expressions appear correctly. Or at least, they do in the one case I have tried.

Since Word 2010 displays the resuls correctly, I can only assume that it does not rely on these explicit font settings, whereas Word 2007 does. This doesn't really help you yet, because altering all those w:r elements would be even harder than what you are already doing. But it is possible that a default style/font needs to be set, either somewhere higher in the XML hierarchy, or perhaps elsewhere in the .zip (perhaps in fontTable.xml or styles.xml). I'm not familiar enough with Word's XML structures to guess what, if anything might be missing, but may be able to have a look tomorrow.

I suppose another possibility is that you just have to have all these extra rPr elements for this to work in Word 2007, which would suggest that pandoc may have been written for Word 2010, not 2007. (I don't know anything about the tool).

As an example, where you have

<m:r>
  <m:t>(</m:t>
</m:r>

what you need is

<m:r>
  <w:rPr>
    <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />
  </w:rPr>
  <m:t>(</m:t>
</m:r>

Altri suggerimenti

I did the following to get rid of the font issue:

  1. Create a new empty word document.
  2. Copy all content to the new document.
  3. Choose Match Source Format.

As discussed above, Windows doesn't have the font Lucida Grande, so substituting the Math Font with Cambria Math should work.

  1. Rename the test.docx to test.zip
  2. vim test.zip and select test/word/settings.xml
  3. find and change Lucida Grande to Cambria Math
  4. save and rename zip to docx. This results in something like this docx.

You can then also supply that file as a sort of docx template to pandoc with the --reference-docx option.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top