Pregunta

I want this functionality in my PHP application:

When user upload a document (PDF, DOCX, DOC, PPT, PPTC extensions) then after uploading user get the total number of pages of document.

But without using exec() function.

¿Fue útil?

Solución

It is possible to do some formats right in PHP. The DOCx and PPTx are easy:

For Word files:

function PageCount_DOCX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            $pageCount = $xml->Pages;
        }
        $zip->close();
    }

    return $pageCount;
}

and for PowerPoint

function PageCount_PPTX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            print_r($xml);
            $pageCount = $xml->Slides;
        }
        $zip->close();
    }

    return $pageCount;
}

Older Office documents are a different story. You'll find some discussion about doing that here: How to get the number of pages in a Word Document on linux?

As for PDF files, I prefer to use FPDI, even though it requires a license to parse newer PDF file formats. You can use do it simply like this:

function PageCount_PDF($file) {
    $pageCount = 0;
    if (file_exists($file)) {
        require_once('fpdf/fpdf.php');
        require_once('fpdi/fpdi.php');
        $pdf = new FPDI();                              // initiate FPDI
        $pageCount = $pdf->setSourceFile($file);        // get the page count
    }
    return $pageCount;
}

Otros consejos

Unfortunately you cannot get the page count of Office files without paginating them first. This cannot be done easily without help of other applications such as MS Office, OpenOffice or others. Even worse 10 page word document created with MS Word can be open as a 11 page document in OpenOffice due to the difference in pagination. Practically for getting the total number of pages of a .doc file, the most reliable solution is to use MS Word. You can do this job through Office Automation but it is quite expensive job for computers as it requires the pagination process for the whole document. Also you need to install MS Word on the computer/server.

You can relatively easily get the total number of pages in a PDF document. The page count information is easily accessible in the PDF format. Most PDF parser/reader libraries will give you a simple API for your purpose.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top