Question

I'm on a Linux server and I need to convert MS Word 97-2003 .doc format to plain text .txt files using PHP

I already tried this solutions:

How to extract text from word file .doc,docx,.xlsx,.pptx php

Extract text from doc and docx

But both are just working fine for .docx format.

The issue is when I convert files, I got scrap characters at the end of the text. The length of the chars I don't need vary depending on the length of the file. Also, it may happen that if the file is a bit long, it get truncated.

Is there any simple way to get this converted?

Was it helpful?

Solution

I've lastly come to use the following solution, launching Antiword:

private function doc() {
    $file = escapeshellarg($this->filename);
    $text = `/usr/sbin/antiword -w 0 $file`;
    return html_entity_decode(utf8_encode(trim($text)));
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top