Question

Is there a way to get efficiently the number of pages of a word document (.doc, .docx) with Python ?

And for an .odt file ?

I want to use this for a web application based on Web2py on Linux.

Thank you !

Was it helpful?

Solution

You can read the value

<Properties>
<Pages>CountValue</Pages>

from docProps/app.xml in the docx package or

<office:document-meta>
    <office:meta>
        <meta:document-statistic meta:page-count="CountValue">

form meta.xml in odt package.

If these values ​​do not exist (they are optional), you have to make a calculation of the entire document, in fact perform rendering, that much more difficult

OTHER TIPS

Only for those who search for this blog entry....

from win32com.client import Dispatch
#open Word
word = Dispatch('Word.Application')
word.Visible = False
word = word.Documents.Open(doc_path)

#get number of sheets
word.Repaginate()
num_of_sheets = word.ComputeStatistics(2)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top